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Introduction 


Nuclear  architecture  is  the  new  dimension  of  regulatory  control, 
functioning  in  conjunction  with  genome  organization  and  epigenetic  marks.  A  full 
understanding  of  a  cell’s  genetic  repertoire  cannot  be  discerned  from  linear 
sequence  analysis  alone.  Instead,  we  must  have  a  full  understanding  of  the  three 
dimensional  nature  of  the  human  genome.  Dynamic  interactions  occur  among 
DNA  elements,  which  can  regulate  gene  expression  over  large  genomic 
distances  on  a  single  chromosome,  through  DNA  looping,  or  even  between 
chromosomes.  We  propose  that  incorporating  new  knowledge  regarding  a  breast 
cancer  gene’s  spatial  interactions  (i.e.,  the  nuclear  neighborhood  within  which  the 
genes  reside)  will  yield  novel  and  more  accurate  predictions  of  breast  cancer 
susceptibility  and  suggest  innovative  therapeutic  options. 

Nuclear  architecture  is  maintained  through  proteins  and  long  noncoding 
RNAs  that  bind  to  DNA  and  stabilize  loops  and  long  range  interactions. 


Body 

Task  1:  Characterize  physical  interactions  between  selected  breast  cancer  loci  in 
normal  and  malignant  mammary  cell  lines.  (Months  1  -  36) 

Insulin-like  growth  factor  binding  protein  3  (IGFBP3)  has  been  implicated 
in  breast  cancer  pathogenesis  (1-5).  IGFBP3  modulates  cell  growth  and  survival 
through  binding  to  insulin-like  growth  factors  I  and  II,  and  regulating  their 
bioavailability  (6).  IGFBP3  has  also  been  proposed  to  function  independently  of 
IGF  and  act  as  a  growth  modulator  (7-9).  While  correlations  between  serum 
levels  of  IGFBP3  and  breast  cancer  have  yielded  contradictory  results  (3-5,  10), 
increased  levels  of  IGFBP3  in  breast  cancer  tissue  is  correlated  with  a  worse 
prognosis  and  poor  clinical  features  (1 ,2). 

Dysregulation  of  IGFBP3  expression  and  hypermethylation  of  its  promoter 
have  been  observed  in  many  cancers  (29).  High  levels  of  IGFBP3  expression 
was  observed  to  increase  survival  of  breast  cancer  cells  exposed  to 
environmental  stress.  We  hypothesized  that  cancer-related  changes  in  IGFBP3 
regulation  might  coincide  with  altered  spatial  positioning  and  long-range  DNA 
interactions  contributing  to  breast  cancer  pathogenesis.  We  therefore  used  the 
IGFBP3  enhancer  as  bait  in  circular  chromosome  conformation  capture  with  high 
throughput  sequencing  (4C-seq)  in  normal  mammary  epithelial  cells  (HMEC)  and 
two  breast  cancer  cell  lines,  MCF7  and  MDA-MB-231 ,  with  opposite  IGFBP3 
expression  profiles. 

Expression  of  IGFBP3  is  downregulated  in  MCF7,  but  upregulated  in  MDA-MB- 
231  relative  to  HMEC. 

To  better  understand  the  role  of  IGFBP3  in  breast  cancer  we  analyzed  its 
expression  in  primary  breast  cells,  the  estrogen  receptor  alpha  (ERa)  positive 
breast  cancer  cell  line  MCF7,  and  the  triple-negative  breast  cancer  cell  line  MDA- 
MB-231  .  IGFBP3  expression  was  increased  nearly  3-fold  in  MDA-MB-231 ,  and 
reduced  3.8-fold  in  MCF7,  relative  to  HMEC  (Figure  1A).  To  evaluate  whether 
DNA  methylation  correlated  with  the  changes  in  expression,  we  examined  the 
methylation  status  of  the  IGFBP3  promoter  by  bisulfite  pyrosequencing.  The 
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IGFBP3  promoter  was  hypermethylated  (91%  CpG  methylation)  in  MCF7 
compared  with  11%  and  10%  CpG  methylation  in  HMEC  and  MDA-MB-231, 
respectively  (figure  IB). 


A 


IGFBP3  RNA 


B 


IGFBP3  Methylation 


HMEC 

MCF7 

MDA-MB-231 


Figure  1.  Expression  and  methylation  status  of  IGFBP3 

A)  qRT-PCR:  RNA  levels  of  IGFBP3  were  measured  in  MCF-7,  MDA-MB-231  and 
HMEC  cells.  Expression  in  cancer  lines  was  plotted  as  fold  change  relative  to  HMEC. 
Data  represent  the  SEM  of  three  independent  biological  replicates.  B)  Percent 
methylation  of  CpG  nucleotides  in  the  IGFBP3  promoter  in  HMEC,  MCF-7  and  MDA-MB- 
231  .  Bars  represent  the  average  percent  methylation  of  2-6  positions  in  the  IGFBP3 
promoter. 
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EGFR  interacts  significantly  with  IGFBP3 

To  identify  whether  changes  in  IGFBP3  expression  and  methylation  were 
accompanied  by  global  alteration  of  its  long-range  chromatin  interactions,  we 
performed  multiplex  4C-seq  in  HMEC,  MCF7  and  MDA-MB-231 .  We  chose  a 
region  upstream  of  IGFBP3,  classified  as  a  strong  enhancer  in  HMEC  by 
chromatin  profiling  of  several  distinctive  features  including  enrichment  of  the 
enhancer  mark  H3K4me1 ,  as  our  bait .  We  obtained  a  combined  total  of 
approximately  12  million  mapped  reads  for  the  three  samples  with  the  majority 
mapping  in  cis.  The  4C-seq  reads  were  binned  into  windows  based  on  the 
number  of  mappable  Hindlll  restriction  sites  ranging  from  25  to  400.  Regions  with 
a  FDR  below  0.01  were  considered  significantly  interacting.  The  significant  long- 
range  cis  interactions  for  window  size  1 00  in  HMEC,  MCF7  and  MDA-MB-231 
are  diagrammed  (Figure  2A).  For  every  window  size  analyzed,  MCF7  contained 
the  largest  number  of  significant  interactions,  followed  by  MDA-MB-231  and 
HMEC.  Within  a  window  size  of  1 00,  there  were  a  total  of  1 6  significant  cis 
interactions  in  HMEC,  51  in  MCF7  and  29  in  MDA-MB-231 .  Of  these  interactions 
8  were  common  to  all  samples. 


Figure  2. 

Among  the  significant  intrachromosomal  interactions  common  to  all  samples,  and 
across  all  window  sizes,  was  an  interaction  with  epidermal  growth  factor  receptor 
(EGFR),  another  breast  cancer  related  gene.  EGFR  is  located  approximately  9 
Mb  from  IGFBP3  on  chromosome  7.  To  examine  this  long-range  interaction  in 
more  detail,  we  labeled  gene  pairs  EGFR  and  IGFBP3  by  3D-FISH  in  HMEC  and 
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breast  cancer  cell  lines  MCF7  and  MDA-MB-231  (figure  3A).  To  quantitate 
differences  in  interaction  frequencies  at  the  cellular  level,  we  measured  the 
center-to-center  distances  between  the  closest  pairs  of  labeled  foci.  In  88%  of 
HMEC  nuclei  counted  EGFR  and  IGFBP3  were  within  1  micron  (Figure  3B).  This 
was  reduced  to  56%  of  MCF7  nuclei,  and  increased  to  96%  of  MDA-MB-231 
nuclei.  To  assess  whether  differences  in  spatial  positioning  were  accompanied 
by  changes  in  expression,  we  measured  RNA  levels  of  EGFR  in  FIMEC,  MCF7 
and  MDA-MB-231  by  qRT-PCR  (Figure  3C).  Relative  to  HMEC,  EGFR 
expression  was  unchanged  in  MDA-MB-231 ,  yet  was  reduced  35-fold  to  nearly 
undetectable  levels  in  MCF7  cells.  In  contrast  to  IGFBP3,  the  EGFR  promoter 
had  no  change  in  CpG  methylation  (data  not  shown). 


A) 

HMEC  MCF7  MDA-MB-231 


Figure  3. 


6 


We  also  discovered  that  recurrent  breakpoints  that  map  within  HMEC  4C 
significant  hits  are  also  present  within  MCF-7  4C  significant  hits. 

Some  of  the  most  significant  4C-seq  interchromosomal  interactions  in 
HMEC  included  regions  containing  the  genes  BCAS  1-4  located  on 
chromosomes  1,17  and  20.  (1 0)  All  4  of  these  genes  were  found  among  the  1 0 
most  significantly  enriched  regions  in  HMEC,  and  the  region  containing  BCAS1 
and  ZNF21 7  was  the  overall  top  scoring  window.  These  interactions  were  also 
enriched  in  MCF7,  where  they  are  frequently  rearranged  and  amplified.  We  used 
3D-FISH  to  investigate  whether  the  IGFBP3  interacting  BCAS  genes  were  also  in 
close  spatial  proximity  with  one  another  prior  to  any  oncogenic  translocations 
(Figure  4).  We  performed  dual  and  triple  labeled  3D-FISH  with  probes  for 
IGFBP3,  BCAS1,  BCAS3  and  BCAS4  in  primary  HMEC  cells  (Figure  1A). 
Center-to-center  distances  were  measured  for  the  closest  pairs  of  foci  for  each 
probe  (Figure  1 B).  All  probes  targeting  the  BCAS  genes  were  in  close  proximity, 
residing  less  than  or  equal  to  1  micron  to  IGFBP3  in  at  least  5%  of  nuclei.  The 
BCAS3-BCAS4  and  BCAS3-BCAS1  regions,  which  undergo  translocations  with 
one  another  in  MCF7  were  also  within  1  micron  in  at  least  4%  of  normal  HMEC 
nuclei.  These  percentages  are  in  line  with  reports  of  positive  trans  interacting  loci 
identified  using  other  molecular  assays.  This  suggests  spatial  proximity  of  the 
BCAS  genes  in  normal  breast  cells  contributes  to  their  frequent  oncogenic 
translocations. 
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Figure  4.  IGFBP3  interacts  with  BCAS  genes. 

A,  Representative  triple  labeled  3D-FISH,  z-axis  projection  images  of  IGFBP3, 
BCAS3,  BCAS4  (left)  and  IGFBP3,  BCAS3,  BCAS1  (right).  Scale  bar  =  10  pm.  B, 
Percentage  of  nuclei  with  the  listed  pair  of  gene  loci  within  1  micron  of  each 
other.  Distances  were  measured  between  the  closest  two  foci  in  each  nucleus. 


Our  study  demonstrates  that  long-range  interactions  of  cancer-related  loci, 
including  EGFR  and  IGFBP3,  are  altered  in  breast  cancer  cells,  and  these 
alterations  are  frequently  associated  with  epigenetic  changes.  Long-range 
interactions  influence  chromosomal  translocations,  and  add  an  additional  layer  of 
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complexity  to  transcriptional  and  epigenetic  regulation  to  coordinate  gene 
expression.  Therefore,  a  better  understanding  of  aberrant  chromatin  interactions 
is  needed  to  fully  understand  cancer  pathology. 


Task  2:  Utilize  a  murine  model  of  xenotropic  tumor  growth  and  metastasis 
to  characterize  the  combinatorial  contribution  of  multiple  disease 
associated  loci.  (Months  12-24) 

We  were  initially  planning  to  study  to  use  shRNA  to  knock  out  predicted  genes  in 
breast  cancer  cell  lines  and  then  use  a  mouse  model  to  determine  the  effect  of 
the  knock  out  on  tumor  growth  and  metastasis.  We  were  not  able  to  successfully 
knock  out  the  gene  we  were  going  to  test  (SAT1B)  and  because  of  limitations  of 
time,  we  did  not  purse  the  murine  model. 

Therefore,  we  changed  the  scope  of  this  task  to  explore  a  more  fruitful  avenue, 
and  we  elected  to  study  another  factor  that  stabilizes  long-range  chromatin 
interactions  in  breast  cancer,  a  long  noncoding  RNA  associated  with  the  IGF1 
receptor. 


IGF1R  is  one  of  the  most  abundantly  phosphorylated  receptor  tyrosine 
kinases  in  tumors  (1 1 ).  The  insulin-like  growth  factor  system,  including  the  type  I 
IGF  receptor  IGF1R  and  the  mitogenic  ligands  IGF-I  and  IGF-II,  is  frequently 
dysregulated  in  breast  cancer  and  is  known  to  contribute  to  disease  progression 
and  metastasis  (12).  IGF-I  and  IGF-II  promote  cell  growth  and  survival  via  the 
IGF1 R  receptor-mediated  signal  transduction  through  intracellular  tyrosine  kinase 
linked  to  the  phosphatidyl-inositol-3  kinase  (PI3K)-Akt-mamma!ian  target  of 
rapamycin  (mTOR)  pathway.  Overexpression  of  IGF1R  activates  the  PI3-K  and 
MAPK  signal  cascades,  resulting  in  cell  proliferation  and  resistance  to 
chemotherapeutic  agents,  radiation,  and  targeted  therapies  using  Tamoxifen  and 
Herceptin  (13).  Therapeutic  agents  targeting  IGF1R  are  currently  in  clinical 
development^  2,  1 4),  including  those  that  inhibit  the  IGF1 R  tyrosine  kinase  using 
monoclonal  antibodies  and  small  molecules.  However,  the  clinical  development 
of  various  IGF1 R  inhibitors  has  been  put  on  hold  due  to  lack  of  sufficient  clinical 
efficacy.  Thus,  the  regulation  of  this  pathway  needs  to  be  further  defined  to  aid  in 
the  development  of  next  generation  regimens. 

Currently,  the  molecular  mechanisms  underlying  the  dysregulation  of  the 
IGF1R  pathway  in  tumors  remain  unknown.  Using  a  recently-developed  R3C 
(RNA-guided  Chromatin  Conformation  Capture)  technique  (15),  we  recently 
identified  a  novel  non-coding  RNA  (IncRNA)  IRAIN  within  the  IGF1R  locus  (16). 
IRAIN  \s  transcribed  from  an  intragenic  promoter  located  in  the  first  intron  of 
IGF1R.  IRAIN  IncRNA  is  transcribed  in  an  antisense  orientation  compared  with 
the  IGF1 R  gene,  and  it  is  expressed  exclusively  from  the  paternal  allele,  with  the 
maternal  allele  being  silenced.  Interestingly,  this  IncRNA  interacts  with  chromatin 
DNA  and  is  involved  in  the  formation  of  an  intrachromosomal  enhancer/promoter 
loop.  In  addition,  IRAIN  was  downregulated  in  leukemia  cell  lines  and  in 
leukocytes  from  patients  with  high-risk  AML.  These  data  suggested  that  IRAIN 
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might  play  a  role  in  the  dysregulation  of  the  IGF  pathway  in  hematopoietic 
malignancies. 

However,  the  function  of  this  noncoding  RNA  in  other  malignancies 
remains  to  be  explored.  The  IGF1R  pathway  is  frequently  dysregulated  in  breast 
cancer.  It  is  unclear  if  /RA//V  IncRNA  is  aberrantly  imprinted  in  breast  cancer 
patients.  In  these  experiments,  we  characterize  the  allelic  expression  of  IRAIN 
IncRNA  in  a  cohort  of  breast  cancer  samples. 

In  breast  cancer  tissues,  we  found  that  IRAIN  IncRNA  was  transcribed 
from  an  intronic  promoter  in  an  antisense  direction  as  compared  to  the  IGF1 R 
coding  mRNA.  Unlike  the  IGF1 R  coding  RNA,  this  noncoding  RNA  was 
imprinted,  with  monoallelic  expression  from  the  paternal  allele.  In  breast  cancer 
tissues  that  were  informative  for  SNP  rs8034564,  there  was  an  imbalanced 
expression  of  the  two  parental  alleles,  where  the  “G”  genotype  was  favorably 
imprinted  over  the  “A”  genotype.  In  breast  cancer  patients,  IRAIN  was  aberrantly 
imprinted  in  both  tumors  and  peripheral  blood  leukocytes,  exhibiting  a  pattern  of 
allele-switch:  the  allele  expressed  in  normal  tissues  was  inactivated  and  the 
normally  imprinted  allele  was  expressed.  Epigenetic  analysis  revealed  that  there 
was  extensive  DNA  demethylation  of  CpG  islands  in  the  gene  promoter.  These 
data  identify  IRAIN  IncRNA  as  a  novel  imprinted  gene  that  is  aberrantly  regulated 
in  breast  cancer. 


Task  3:  Identify  additional  genomic  sites  that  interact  with  our  selected 
3DAS  loci.  (Months  12-24) 


Recurrent  breakpoints  that  map  within  HMEC  4C  significant  hits  are  also  present 
within  MCF-7  4C  significant  hits 

We  constructed  a  circus  plot  to  highlight  the  significant  interchromosomal 
interactions  involving  the  IGFBP3  enhancer  in  HMEC,  MCF7  and  MDA-MB-231 
that  fell  within  a  window  size  of  200  (Figure  5A).  There  were  a  total  of  87 
significant  interactions  in  HMEC,  1 94  in  MCF7  and  1 1 5  in  MDA-MB-231 .  Of 
these  interactions  only  1 1  were  common  to  all  samples  (figure  5B).  Because  a 
large  proportion  of  the  significant  4C  windows  fell  within  chromosomes  prone  to 
rearrangements,  fusions  and  amplifications,  we  compared  the  locations  of  a  list 
of  157  breakpoints  mapped  in  MCF7  cells  to  our  significant  interchromosomal  4C 
windows.  The  breakpoints  could  be  categorized  as  2  distinct  types.  The  first 
category  contained  the  majority  of  breakpoints,  which  were  dispersed  throughout 
the  genome  in  regions  of  low  copy  repeats.  The  second  category  included  MCF7 
breakpoints  falling  within  four  highly  amplified  regions  located  on  chromosomes 
1 , 3,  1 7  and  20.  We  found  that  breakpoints  falling  within  our  4C  windows  were 
almost  exclusively  in  the  latter  category.  We  considered  a  subset  of  74 
breakpoints,  described  as  interchromosomal  rearrangements  ,  and  determined 
how  many  of  these  fell  within  significant  4C  windows  in  MCF-7.  As  a  comparison 
we  also  mapped  these  breakpoints  to  our  significant  4C  windows  in  HMEC.  A 
total  of  29  breakpoint  ends  mapped  within  significant  windows  in  HMEC,  as 
compared  to  61  in  the  MCF-7  line.  Interestingly,  all  but  1  of  the  breakpoints  within 
HMEC  4C  windows  was  also  present  within  MCF-7  4C  hits.  Also,  when  we 
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compared  the  number  of  breakpoints  of  which  both  ends  of  the  breakpoint 
mapped  to  a  4C  hit,  the  percentage  was  nearly  twice  as  many  in  the  breast 
cancer  cell  line  MCF-7  as  in  HMEC. 


B) 


Figure  5 
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KEY  RESEARCH  ACCOMLISHMENTS 


o  Development  of  4C-seq  assays  for  breast  cancer  cells 
o  Demonstration  that  breast  cancer  cells  differ  from  normal  cells  and  from 
each  other  in  their  “interactome” 

o  Discovery  of  an  imprinted  long  noncoding  RNA  ( IRAIN)  that  is  crucial  in 
forming  a  loop  between  the  IGF1  -receptor  promoter  and  enhancer  and 
which  is  dysregulated  in  breast  cancer. 


REPORTABLE  OUTCOMES 

o  Manuscripts: 

Zeitz  MJ,  Ay  F,  Fleidmann  JD,  Lerner  PL,  Noble  WS,  Steelman  BN  and 
Hoffman  AR.  Genomic  interaction  profiles  in  breast  cancer  reveal  altered 
chromatin  architecture.  PLoS  One  Sep  3;8(9):  e73974.  doi: 

1 0.1 371/journal. pone. 0073974.  2013 

Kang  L,  Sun  J,  Wen  X,  Cui  J,  Wang  G,  Hoffman  AR,  Hu  JF  and  Li  W. 
Aberrant  allele-switch  imprinting  of  a  novel  IGF1R  intragenic  antisense  non¬ 
coding  RNA  in  breast  cancers.  Eur  J  Cancer  51 :  260-270,  2015. 

o  Licenses:  none 
o  Degrees  obtained:  n/a 

o  Development  of  cell  lines,  tissue  or  serum  repositories:  none 
o  Informatics:  new  sets  of  data  regarding  interchromosomal  interactions 
o  Funding  applied  for  based  on  this  award:  none 
o  Employment  or  research  opportunities:  none 


CONCLUSIONS 

Physical  contact  is  a  prerequisite  for  chromosomal  translocations.  Both 
cytogenetic  and  molecular  evidence  suggests  spatial  proximity  influences 
recurrent  chromosomal  translocations.  Long  noncoding  RNAs  help  stabilize  long- 
range  intra-  and  inter  chromosomal  interactions.  We  have  described  a  novel  long 
noncoding  RNA  derived  from  the  IGF-1  receptor  locus  that  stabilizes  a  loop 
between  the  IGF1 R  promoter  and  its  enhancer,  which  is  dysregulated  in  breast 
cancer. 

Our  data  also  demonstrate  that  there  are  numerous  breast  cancer  genes 
present  within  significantly  interacting  regions  in  normal  breast  cells.  These  data 
suggest  the  possibility  that  certain  loci  in  the  genome  form  “hubs”  of  preferentially 
interacting  loci.  These  hubs  may  have  a  functional  purpose,  such  as  being  co¬ 
transcribed  in  “transcription  factories.”  It  is  likely  that  these  interacting  genes 
regulate  each  other’s’  transcription  and  that  changes  in  long  range  interactions  in 
cancer  may  lead  to  detrimental  changes  in  gene  expression.  Breakpoint  analysis 
suggests  that  when  an  interacting  region  undergoes  a  translocation  an  additional 
interaction  detectable  by  4C  is  gained.  Overall,  our  data  from  multiple  lines  of 
evidence  suggest  an  important  role  for  long-range  chromosomal  interactions  in 
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the  pathogenesis  of  breast  cancer,  and  it  is  possible  that  new  gene  targets  for 
diagnosis  or  therapeutics  may  become  evident  from  the  study  of  interactome 
informatics. 
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Abstract 

Gene  transcription  can  be  regulated  by  remote  enhancer  regions  through  chromosome  looping  either  in  c/s  or  in  trans. 
Cancer  cells  are  characterized  by  wholesale  changes  in  long-range  gene  interactions,  but  the  role  that  these  long-range 
interactions  play  in  cancer  progression  and  metastasis  is  not  well  understood.  In  this  study,  we  used  IGFBP3,  a  gene  involved 
in  breast  cancer  pathogenesis,  as  bait  in  a  4C-seq  experiment  comparing  normal  breast  cells  (HMEC)  with  two  breast  cancer 
cell  lines  (MCF7,  an  ER  positive  cell  line,  and  MDA-MB-231,  a  triple  negative  cell  line).  The  IGFBP3  long-range  interaction 
profile  was  substantially  altered  in  breast  cancer.  Many  interactions  seen  in  normal  breast  cells  are  lost  and  novel 
interactions  appear  in  cancer  lines.  We  found  that  in  HMEC,  the  breast  carcinoma  amplified  sequence  gene  family  (BCAS)  1  - 
4  were  among  the  top  10  most  significantly  enriched  regions  of  interaction  with  IGFBP3.  3D-FISH  analysis  indicated  that  the 
translocation-prone  BCAS  genes,  which  are  located  on  chromosomes  1,  17,  and  20,  are  in  close  physical  proximity  with 
IGFBP3  and  each  other  in  normal  breast  cells.  We  also  found  that  epidermal  growth  factor  receptor  (EGFR),  a  gene  implicated 
in  tumorigenesis,  interacts  significantly  with  IGFBP3  and  that  this  interaction  may  play  a  role  in  their  regulation.  Breakpoint 
analysis  suggests  that  when  an  IGFBP3  interacting  region  undergoes  a  translocation  an  additional  interaction  detectable  by 
4C  is  gained.  Overall,  our  data  from  multiple  lines  of  evidence  suggest  an  important  role  for  long-range  chromosomal 
interactions  in  the  pathogenesis  of  cancer. 
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Introduction 

It  is  now  widely  recognized  that  the  spatial  organization  of  the 
genome  and  not  only  its  linear  sequence  is  essential  for  normal 
genome  function  [1],  Recent  breakthroughs  combining  high 
throughput  DNA  sequencing  and  molecular  assays  have  revolu 
tionized  our  understanding  of  chromatin  organization  [2,3,4], 
Three  dimensional  chromatin  structure  is  important  in  the 
regulation  of  transcription  [5],  and  in  the  control  of  epigenetic 
states  (including  the  regulation  of  imprinted  genes)  by  means  of 
chromosome  looping  between  distant  regulatory  regions  on  the 
same  or  on  different  chromosomes  [6,7].  Dynamic,  long  range 
interactions  have  been  observed  to  regulate  gene  expression, 
contribute  to  the  developmental  processes  of  T  cell  differentiation 
and  X  inactivation,  and  may  play  a  role  in  tumorigenesis 
[7,8,9,10,1 1].  The  interchromosomal  interaction  between  the  Ifng 
promoter  on  chromosome  10  and  the  TH2  cytokine  gene  locus  on 
chromosome  11  in  naive  T  cells  maintains  both  loci  in  a 
configuration  poised  for  rapid  transcription  and  is  thought  to 
facilitate  the  developmental  choice  between  TH1  orTH2  cells  [8]. 
Transient  homologous  pairing  of  X  inactivation  centers  early  in 
development  is  crucial  for  correct  X  chromosome  dosage 


compensation  in  mammalian  females  [9,10].  We  have  shown  that 
Igj2  on  chromosome  7  interacts  with  the  Wsbl/Nfl  locus  on 
chromosome  1 1 ,  and  disruption  of  this  interaction  results  in 
decreased  expression  of  Wsbl  and  Nfl  [7].  We  also  observed  a 
substantial  alteration  in  chromatin  structure  within  human  cancers 
that  have  lost  IGF2  imprinting,  resulting  in  a  striking  loss  of  long 
range  interactions  across  the  IGF2/H19  locus  [1 1].  These  studies 
indicate  that  a  better  understanding  of  intricate  3D  chromatin 
organization  is  crucial  to  understanding  human  diseases,  partic 
ulariy  cancer,  in  which  genomic  instability  and  dysregulation  are 
widespread. 

Breast  cancer  is  a  complex  disease  that  involves  alterations  in 
both  genetic  and  epigenetic  factors  [12,13,14].  While  numerous 
genetic  mutations,  translocations  and  aberrant  DNA  methylation 
have  been  reported  in  breast  cancer,  the  role  of  long  range 
interactions  during  cancer  progression  remains  elusive.  Recent 
evidence  suggests  that  genome  organization  is  altered  early  in 
breast  tumorigenesis  [15].  Cancer  related  genes  were  observed  to 
change  their  radial  positions  in  a  cell  culture  model  of  early  breast 
tumor  development  [15].  Changes  in  radial  position  of  cancer 
related  genes  were  also  observed  in  breast  tumor  tissue  samples, 
and  were  not  caused  by  genomic  instability  [16]. 
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Insulin  like  growth  factor  binding  protein  3  ( IGFBP3 )  has  been 
implicated  in  breast  cancer  pathogenesis  [17,18,19,20,21]. 
IGFBP3  modulates  cell  growth  and  survival  by  binding  to 
insulin  like  growth  factors  I  and  II,  and  regulating  their 
bioavailability  [22].  IGFBP3  has  also  been  proposed  to  function 
independently  of  IGF  I  or  IGF  II  and  act  as  a  growth  modulator 
[23,24,25].  While  correlations  between  serum  levels  of  IGFBP3 
and  breast  cancer  have  yielded  contradictory  results  [19,20,21,26], 
increased  levels  of  IGFBP3  in  breast  cancer  tissue  is  correlated 
with  a  worse  prognosis  and  poor  clinical  features  [17,18]. 

Dysregulation  of  IGFBP3  expression  and  hypermethylation  of  its 
promoter  have  been  observed  in  many  cancers  [27].  Increased 
IGFBP3  expression  has  been  shown  to  enhance  survival  of  breast 
cancer  cells  exposed  to  environmental  stress  [28].  Alternatively,  a 
mouse  model  of  prostate  cancer  crossed  with  a  knockout  of  Igfbp3 
displayed  significant  increase  in  metastasis  in  double  mutant 
animals.  In  vitro  assays  of  prostate  cell  lines  derived  from  these 
mouse  lines  also  indicated  a  more  aggressive  cancer  phenotype  in 
IGFBP3  deficient  cells  [29].  We  sought  to  explore  global 
differences  of  IGFBP3  long  range  interaction  profiles  between 
normal  breast  cells  and  breast  cancer  cell  lines.  We  hypothesized 
that  cancer  related  changes  in  IGFBP3  regulation  and  epigenetic 
modification  might  coincide  with  altered  spatial  positioning  and 
long  range  DNA  interactions  contributing  to  breast  cancer 
pathogenesis.  We  therefore  used  the  IGFBP3  enhancer  as  bait  in 
circular  chromosome  conformation  capture  with  high  throughput 
sequencing  (4C  seq)  in  normal  human  mammary  epithelial  cells 
(HMEC)  and  two  breast  cancer  cell  lines,  MCF7  and  MDA  MB 
231.  MCF7  and  MDA  MB  231  represent  distinct  breast  cancer 
subtypes.  MCF7  is  a  human  breast  adenocarcinoma  cell  line 
positive  for  estrogen  receptor  alpha,  and  MDA  MB  231  is  a 
human  breast  carcinoma  cell  line  negative  for  estrogen  and 
progesterone  receptors  as  well  as  HER2.  The  IGFBP3  promoter 
displays  hypermethylation,  and  there  is  reduced  IGFBP3  expres 
sion  in  MCF7,  while  in  MDA  MB  231,  the  promoter  is  relatively 
hypomethylated,  and  IGFBP3  is  over  expressed  compared  to 
HMEC. 

In  this  study,  we  examined  IGFBP3  long  range  interactions  and 
show  that  the  three  dimensional  structure  of  the  genome  changes 
dramatically  in  breast  cancer.  Our  data  suggest  a  possible  role  for 
long  range  chromatin  interactions  in  the  pathogenesis  of  breast 
cancer  as  well  as  in  the  formation  of  translocations  often  seen  in 
malignant  cells. 

Results 

Expression  of  IGFBP3  is  Downregulated  in  MCF7,  but 
Upregulated  in  MDA-MB-231  Relative  to  HMEC 

To  better  understand  the  role  of  IGFBP3  in  breast  cancer,  we 
analyzed  its  expression  in  primary  breast  cells,  the  estrogen 
receptor  alpha  (ERa)  positive  breast  cancer  cell  line  MCF7,  and 
the  triple  negative  breast  cancer  cell  line  MDA  MB  231.  IGFBP3 
expression  was  increased  nearly  3  fold  in  MDA  MB  231,  and 
reduced  3.8  fold  in  MCF7,  relative  to  HMEC  (Figure  1A).  To 
evaluate  whether  DNA  mediylation  correlated  with  the  changes  in 
expression,  we  examined  the  methylation  status  of  the  IGFBP3 
promoter  by  bisulfite  pyrosequencing.  The  IGFBP3  promoter  was 
hypermethylated  (91%  CpG  methylation)  in  MCF7  compared 
with  1 1%  and  10%  CpG  methylation  in  HMEC  and  MDA  MB 
231,  respectively  (Figure  IB). 

EGFR  Interacts  Significantly  with  IGFBP3 

To  identify  whether  changes  in  IGFBP3  expression  and 
methylation  were  accompanied  by  global  alteration  of  its  long 
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range  chromatin  interactions,  we  performed  multiplex  4C  seq  in 
HMEC,  MCF7  and  MDA  MB  231.  We  chose  as  our  bait  a  region 
upstream  of  IGFBP3  classified  as  a  strong  enhancer  in  HMEC  by 
chromatin  profiling  of  several  distinctive  features,  including 
enrichment  of  the  enhancer  marks  H3K4mel  and  H3K4me2 
and  the  active  regulatory  H3K9ac  and  H3K27ac  marks  (Figure 
SI)  [30],  We  obtained  a  combined  total  of  approximately  12 
million  mapped  reads  from  the  three  cell  lines  with  the  majority 
mapping  in  cis  (Table  SI).  The  4C  seq  reads  were  binned  into 
windows  based  on  the  number  of  mappable  Hindlll  restriction 
sites,  ranging  from  25  to  400.  Regions  with  a  false  discovery  rate 
(FDR)  below  0.01  (see  Methods)  were  considered  to  be  signifi 
candy  interacting.  The  significant  long  range  cis  interactions  for 
window  size  100  in  HMEC,  MCF7  and  MDA  MB  231  are 
diagrammed  in  Figure  2A.  For  every  window  size  analyzed, 
MCF7  contained  the  largest  number  of  significant  long  range 
intrachromosomal  interactions,  followed  by  MDA  MB  231  and 
HMEC.  Using  a  window  size  of  100,  there  were  a  total  of  16 
significant  cis  long  range  interactions  in  HMEC,  51  in  MCF7  and 
29  in  MDA  MB  231.  Of  these  interactions,  8  were  common  to  all 
3  cell  lines,  indicating  a  50%  conservation  of  all  high  confidence 
long  range  interactions  from  HMEC  (Figure  2B).  Numerous  novel 
long  range  interactions  were  observed  in  each  cancer  cell  line,  and 
some  long  range  interactions  found  in  normal  cells  were  lost  in 
each  cancer  cell  line. 

Among  the  significant  intrachromosomal  interactions  common 
to  all  samples,  and  across  all  window  sizes,  was  an  interaction  with 
epidermal  growth  factor  receptor  {EGFR),  another  breast  cancer 
related  gene.  EGFR  is  located  approximately  9  Mb  from  IGFBP3 
on  chromosome  7.  To  examine  this  long  range  interaction  in  more 
detail,  we  labeled  gene  pairs  EGFR  and  IGFBP3  by  3D  FISH  in 
HMEC  and  breast  cancer  cell  lines  MCF7  and  MDA  MB  231 
(Figure  3A).  To  quantitate  differences  in  interaction  frequencies  at 
the  cellular  level,  we  measured  the  center  to  center  distances 
between  the  closest  pairs  of  labeled  foci.  In  88%  of  HMEC  nuclei 
counted,  EGFR  and  IGFBP3  were  within  1  micron  of  each  other, 
indicating  frequent  interactions  (Figure  3B).  This  interaction 
frequency  was  only  56%  in  MCF7  nuclei,  but  was  96%  in 
MDA  MB  231  nuclei.  To  assess  whether  differences  in  spatial 
positioning  were  accompanied  by  changes  in  expression,  we 
measured  RNA  levels  of  EGFR  in  HMEC,  MCF7  and  MDA  MB 
231  by  qRTPCR  (Figure  3C).  Relative  to  HMEC,  EGFR 
expression  was  unchanged  in  MDA  MB  231,  yet  it  was  reduced 
35  fold  to  nearly  undetectable  levels  in  MCF7  cells.  In  contrast  to 
IGFBP3,  the  expression  change  in  EGFR  was  not  accompanied  by 
a  change  in  CpG  methylation  in  the  EGFR  promoter  among  the 
three  cell  lines  (data  not  shown).  This  suggests  the  difference  in 
EGFR  expression  could  be  driven  in  part  by  chromatin  architec 
ture  rather  than  methylation.  In  MCF7,  the  reduction  in  long 
range  interaction  frequency  with  EGFR  provides  the  opportunity 
for  IGFBP3  to  form  additional  contacts.  This  may  partially  explain 
the  gain  of  35  unique  intrachromosomal  interactions  in  MCF7 
cells  compared  to  HMEC. 

Interchromosomal  Rearrangements  Involving  IGFBP3 
Interacting  Regions  Facilitate  an  Increase  in  Long-Range 
Interactions  in  MCF7 

We  constructed  circos  plots  to  highlight  the  significant 
interchromosomal  interactions  involving  the  IGFBP3  enhancer  in 
HMEC,  MCF7  and  MDA  MB  231  that  fell  within  a  window  size 
of  200  (Figure  4A,  Figure  S2).  There  were  a  total  of  87  significant 
interactions  in  HMEC,  194  in  MCF7  and  115  in  MDA  MB  231. 
Of  these  interactions  only  1 1  were  common  to  all  samples 
(Figure  4B,  Table  1).  Because  a  large  proportion  of  the  significant 
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Figure  1.  Expression  and  methylation  status  of  IGFBP3.  A,  qRT  PCR:  RNA  levels  of  IGFBP3  were  measured  in  MCF7:  MDA  MB  231  and  HMEC 
cells.  Expression  in  cancer  lines  was  plotted  as  fold  change  relative  to  EIMEC.  Data  represent  the  SEM  of  three  independent  biological  replicates.  B, 
Percent  methylation  of  CpG  nucleotides  in  the  IGFBP3  promoter  in  EIMEC,  MCF7  and  MDA  MB  231.  Bars  represent  the  average  percent  methylation  of 
4  positions  in  the  IGFBP3  promoter. 
doi:10.1371/journal.pone.0073974.g001 


4C  windows  fell  within  chromosome  regions  prone  to  rearrange 
merits,  fusions  and  amplifications,  we  compared  the  locations  of 
157  breakpoints  mapped  in  MCF7  cells  [31]  to  the  list  of  regions 
that  participated  in  significant  interchromosomal  interactions.  We 
have  limited  our  analysis  to  the  relationship  between  interactions 
in  normal  HMEC  and  known  breakpoints  in  MCF7  since  it  was 
the  only  cell  line  with  comprehensive  breakpoint  data  available. 
This  allows  for  the  correlation  of  interactions  pre  and  post 
breakage.  The  MCF7  breakpoints  could  be  categorized  as  2 
distinct  types.  The  first  category  contains  the  majority  of 
breakpoints,  which  are  dispersed  throughout  the  genome  in 
regions  of  low  copy  repeats.  The  second  category  includes  MCF7 
breakpoints  falling  within  four  highly  amplified  regions  located  on 
chromosomes  1,  3,  17  and  20.  We  found  that  breakpoint  regions 
that  also  participated  in  interchromosomal  interactions  were 
almost  exclusively  in  the  latter  category.  We  then  considered  a 
subset  of  74  MCF7  breakpoints,  described  as  interchromosomal 
rearrangements,  and  determined  how  many  were  associated  with 
long  range  chromatin  interactions  in  HMEC  and  MCF7  cell  lines 
(Table  2).  A  total  of  29  breakpoint  ends  mapped  within  significant 
windows  in  HMEC,  as  compared  to  61  in  the  MCF7  line.  All  but 
one  of  the  breakpoints  within  HMEC  4C  windows  was  also 
present  within  MCF7  4C  windows.  Importantly,  when  we 
compared  the  number  of  breakpoints  for  which  both  ends  of  the 
breakpoint  mapped  to  a  4C  hit,  the  percentage  was  nearly  twice  as 
many  in  the  breast  cancer  cell  line  MCF7  as  in  HMEC.  This 
suggests  that  when  an  IGFBP3  interacting  region  undergoes  a 
translocation  involving  a  different  chromosome,  the  IGFBP3 
interaction  is  not  lost,  but  instead  the  translocation  brings  into 
proximity  an  additional  interaction  detectable  by  4C. 

Breast  Carcinoma  Amplified  Sequence  (BCAS1-4)  Genes 
Interact  Significantly  with  IGFBP3  and  Each  Other  in 
Normal  Breast  Cells 

Some  of  the  most  significant  4C  seq  interchromosomal  inter 
actions  in  HMEC  included  regions  containing  the  genes  BCAS  1  4 
located  on  chromosomes  1,  17  and  20.  All  4  of  these  genes  were 
found  among  the  10  most  significantly  enriched  regions  in  HMEC, 
and  the  region  containing  BCAS1  and  %NF21 7  was  the  overall  top 


scoring  window.  These  interactions  were  also  enriched  in  MCF7, 
where  they  are  frequently  rearranged  and  amplified  (Table  3).  We 
used  3D  FISH  to  investigate  whether  die  IGFBP3  interacting 
BCAS  genes  were  also  in  close  spatial  proximity  with  one  another 
prior  to  any  oncogenic  translocations  (Figure  5).  We  performed 
dual  and  triple  labeled  3D  FISH  with  probes  for  IGFBP3,  BCAS1, 
BCAS3  and  BCAS4  in  primary  HMEC  cells  (Figure  5A).  Center  to 
center  distances  were  measured  for  the  closest  pairs  of  foci  for  each 
probe  (Figure  5B).  All  probes  targeting  the  BCAS  genes  were  in 
close  proximity,  residing  less  than  or  equal  to  1  micron  to  IGFBP3 
in  at  least  5%  of  nuclei.  The  BCAS3  BCAS4  and  BCAS3  BCAS1 
regions,  which  undergo  translocations  with  one  another  in  MCF7 
[31],  were  also  within  1  micron  in  at  least  4%  of  normal  HMEC 
nuclei.  These  percentages  are  in  line  with  reports  of  positive  trans 
interacting  loci  identified  using  other  molecular  assays  [32,33]. 
This  suggests  spatial  proximity  of  the  BCAS  genes  in  normal  breast 
cells  contributes  to  their  frequent  oncogenic  translocations. 

Methylated  Promoters  in  Breast  Cancer  Disproportionally 
Fall  within  4C  Windows 

Using  genome  wide  CpG  methylation  data  from  Sproul  et  al. 
[34] ,  we  analyzed  the  distribution  of  methylated  promoters  in  our 
4C  data  sets.  CpG  sites  with  a  value  equal  or  greater  than  0.8  were 
considered  methylated.  Consistent  with  an  increase  in  global  CpG 
methylation  in  breast  cancer,  the  total  number  of  methylated  sites 
was  greater  in  MCF7  (3847  sites)  and  MDA  MB  23 1  (3282  sites), 
compared  with  HMEC  (374  sites).  There  is  a  significant  increase 
in  the  proportion  of  methylated  promoters  that  participated  in 
long  range  interactions  with  IGFBP3  in  both  breast  cancer  cell 
lines  relative  to  HMEC.  This  increase  was  more  pronounced  in 
MCF7  cells  where  IGFBP3  itself  is  hypermethylated  (Table  S2). 
After  correcting  for  the  total  number  of  methylated  sites,  there  was 
a  3.77  fold  (Fisher’s  exact  test,  one  sided  p  value  4.742  x  10  9)  and 
2.85  fold  (Fisher’s  exact  test,  onesided  p  value  1.122x10  5) 
increase  in  methylated  promoters  located  within  our  4C  windows 
in  MCF7  and  MDA  MB  231,  respectively. 
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Figure  2.  Intrachromosomal  interaction  profile  of  IGFBP3.  A,  Spicier  plot  showing  the  significant  tong  range  interactions  of  the  IGFBP3 
enhancer  across  chromosome  7  for  a  window  size  of  100  consecutive  restriction  fragments  in  HMEC  (blue),  MDA  MB  231  (red),  and  MCF7  (green).  Mb 
position  is  plotted.  Tick  marks  on  chromosome  7  represent  gene  locations  with  positive  strand  genes  on  top  and  negative  strand  genes  on  bottom.  B, 
Domainograms  illustrating  the  significance  of  intrachromosomal  interactions  for  window  sizes  ranging  from  3  to  200  consecutive  fragments  for  each 
cell  line.  The  color  represents  -  tog(p  value)  of  the  calculated  significance  score  ranging  from  black  (not  significant)  to  white  (most  significant).  The 
gray  region  corresponds  to  the  centromere  of  chromosome  7,  which  lacks  Hindlll  cut  sites. 
doi:1 0.1 371/journal,  pone. 0073974.g002 
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Figure  3.  Interaction  frequency  of  IGFBP3  with  the  breast  cancer  related  gene  EGFR  by  3D  FISH.  A,  3D  FISH  labeling  of  breast  cancer 
related  loci  in  HMEC,  MCF7,  MDA  MB  231.  BAC  probe  combinations:  IGFBP3  (green)  and  EGFR  (red)  n  =  50,  DAPI  DIMA  stain  (blue),  boxes  in  lower  right 
corner  contain  a  magnified  view  of  each  interaction.  Scale  bar  =10  pm.  B,  Cumulative  percentage  of  distances  between  IGFBP3  and  EGFR  loci. 
Distances  were  measured  between  the  closest  two  foci  in  each  nucleus.  C,  qRT  PCR:  RNA  levels  of  EGFR  measured  in  MCF7,  MDA  MB  231  and  HMEC 
cells.  Expression  in  cancer  lines  plotted  as  fold  change  relative  to  HMEC.  Data  represent  the  SEM  of  three  independent  biological  replicates. 
doi:10.1371/journal.pone.0073974.g003 


Discussion 

Chromatin  structure  plays  a  key  role  in  establishing  and 
maintaining  tissue  specific  gene  expression  profiles  throughout 
development.  Epigenetic  modification  of  chromatin  can  influence 
DNA  packaging  and  accessibility  to  trans  acting  regulatory  factors. 
Active  regulatory  regions  are  maintained  in  open  chromatin, 
characterized  by  nucleosome  depletion  and  DNase  I  hypersensi 
tivity  [35].  A  vast  number  of  transcription  factor  binding  sites  are 
situated  far  from  any  transcription  start  site,  and  interactions 
occurring  among  distant  regulatory  elements  can  regulate  gene 
expression  [30] .  Long  range  interactions  between  active  regulatory 
elements  may  therefore  provide  a  means  to  fine  tune  gene  activity. 

The  importance  of  long  range  interactions  may  be  especially 
relevant  in  cancer  where  genomic  instability  and  extensive 
epigenetic  modification  of  chromatin  is  common.  Rickman  et  al., 
for  example,  found  that  overexpression  of  an  oncogenic  transcrip 


tion  factor  in  normal  cells  leads  to  large  scale  changes  in 
chromatin  organization  [36].  We  have  seen  that  there  is  a 
dramatic  change  in  long  range  interactions  in  cancer  cells 
compared  with  cells  derived  from  normal  tissues.  We  have 
previously  shown  that  loss  of  IGF2  imprinting  in  cancer  is 
accompanied  by  loss  of  normal  long  range  intrachromosomal 
interactions  involving  the  IGF2/H19  locus  [11].  In  this  study  we 
have  expanded  our  view  of  long  range  interactions  in  cancer  by 
exploring  the  genome  wide  interaction  profile  of  IGFBP3. 

IGFBP3  plays  a  major  role  in  IGF  signaling  through  binding  the 
majority  of  circulating  IGF  I  and  IGF  II,  and  it  may  also  function 
independently  in  a  growth  stimulating  or  inhibitory  fashion 
depending  on  the  system  studied.  We  observed  that  IGFBP3 
interacts  with  epidermal  growth  factor  receptor  (EGFR)  in  all  3  cell 
lines.  EGFR  is  a  receptor  tyrosine  kinase  whose  dysregulation  can 
promote  tumorigenesis,  and  nuclear  EGFR  has  been  shown  to 
function  as  a  transcription  factor  to  activate  genes  required  for  cell 
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Figure  4.  Interchromosomal  interaction  profile  of  IGFBP3.  A,  Circos  plots  showing  the  distribution  of  significant  interchromosomal  interactions 
involving  IGFBP3  in  HMEC,  MCF7  and  MDA  MB  231 .  Grey  lines  in  MCF7  plot  represent  interchromosomal  translocations,  adapted  from  Hampton  et  al. 
[31],  falling  within  windows  of  significant  4C  interactions.  B,  Venn  diagram  showing  the  number  of  unique  and  overlapping  significant 
interchromosomal  interactions  for  a  window  size  of  200  consecutive  restriction  fragments. 
doi:1 0.1 371/journal,  pone. 0073974.g004 


Table  1.  Common  from  interactions  among  all  samples. 

window 

genes 

chr20:51905176  52752692 

BCAS1,  ZNF217,TSHZ2,  SUMO  IP  1,  MIR4756 

chr!7:58766326  59643391 

BCAS3,  TBX2  C17orf82,  TBX4 

chr20:4861 5985  49541607 

BCAS4,  UNC00651,  UBE2V1,  TMEM189.CEBPB,  LOC284751,  PTPN1,  MIR6 45,  EAM65C,  PARD6B,  ADNP 

chr20:46815739  47725424 

UNC00494,  PREX1,  ARFGEF2,  CSE1L 

chr1:144919188  145816358 

PDE4DIP,  SEC22B,  NOTCH2NL,  NBPF10,  HFE2,  TXNIP,  POLR3CL,  ANKRD34A,  UX1L,  RBM8A,  CNRHR2,  PEX1  IB, 
fTGAlO,  ANKRD35,  PIAS3,  NUDT17,  POLR3C,  RNF115,  CD160,  PDZK1,  GPR89A 

Chr2ft45119530  45995741 

ZNF334,  OCSTAMP,  SLC13A3,  TPS3RK,  SLC2A10,  EYA2,  MIR3616,  ZMYND8,  LOC100131496 

chr3:196975652  197787067 

DLG1,  MIR4797,  DLG1  AST,  BDH1,  LOQ20729,  K1AA0226,  MIR922,  FYTTD1,  LRCH3,  IQCG,  RPL35A,  LMLN, 
ANKRD18DP 

chr1:200591661  201448561 

DDXS9,  CAMSAP2,  GPR2S,  ClorflOe,  KIF2}B,  CACNAIS,  ASCL5,  TMEM9,  K3FN 1,  PKP],  TNNT2,  LAD1,  TNNI1, 
PHLDA3 

chr2‘248 98227  25798560 

NCOA1,  PTRHD1,  CENPO,  ADCY3,  DNAJC27,  DNAJC27  AS !,  EFR3B,  POMC,  DNMT3A,  MIR1301  DTNB 

chr4:1 134384  2497968 

SPON2,  LOC1001 30872,  CTBP1,  CTBP1  AS1,  MAEA,  UVSSA,  CRIPAK,  FAMS3A,  SLBP,  TMEM129,  TACC3,  FGFR3, 
LETM1,  WHSC1,  SCARNA22,  WHSC2,  MIR943,  C4orf48,  NAT8L,  POLN,  HAUS3,  MXD4,  MIR4800,  ZFYVE28, 
LOC402160,  RNF4 

ch 19:132166602  133421900 

LOa00S06m,  C9orf50,  NTMT1,  ASB6,  PRRX2,  PTGES,  TORIB,  TORI  A,  C9orf78,  USP20,  FNBP1,  GPR107,  NCS1, 
ASS1 

doi:1 0.1 371  /joumal.pone.0073974.t001 
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Table  2.  Distribution  of  MCF7  translocation  breakpoints. 

HMEC 

MCF7 

4C  windows  containing  at  least  one  breakpoint  end 

11.5% 

13.4% 

Total  number  of  breakpoint  ends  mapping  to  4C  windows 

29 

61 

Number  of  breakpoint  ends  common  to  HMEC  and  MCF7 

28 

28 

Breakpoints  with  both  ends  in  4C  windows 

34.5% 

68.9% 

doi:1 0.1371/joumal.pone.0073974.t002 

proliferation  [37,38].  Recently,  the  cancer  genome  atlas  network 
identified  four  major  subtypes  of  breast  cancer  based  on  extensive 
genomic  analyses.  They  found  high  level  EGFR  and  phosphor 
ylated  EGFR  to  be  associated  with  a  subset  of  breast  cancers  with 
HER2  enrichment,  suggesting  possible  targets  for  combined 


Table  3.  BCAS  gene  loci  are  located  in  significantly 
interacting  4C  windows. 


Cell  Line  BCAS1  chr20  BCAS2  chrl  BCAS3  chrl  7  BCAS4  chr20 
HMEC  1  10  3  5 

MCF7  4  1  7  14 

MDA  MB  231  1  NA  8  5 

Numbers  represent  rank  by  p  value  with  1  being  the  most  significant 
interaction. 

doi:10.1371/joumal.pone.0073974.t003 

therapy  [39].  Crosstalk  exists  between  insulin  like  growth  factor  1 
receptor  (IGF1R),  and  other  signaling  receptors  including  EGFR. 
Inhibiting  either  IGF1R  or  EGFR  results  in  activation  of  the 
reciprocal  receptor,  suggesting  that  combined  inhibition  of  both 
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Figure  5.  IGFBP3  interacts  with  BCAS  genes.  A,  Representative  triple  labeled  3D  FISH,  z  axis  projection  images  of  IGFBP3,  BCAS3,  BCAS4  (left)  and 
IGFBP3,  BCAS3,  BCAS1  (right).  Scale  bar  =  10  pm.  B,  Percentage  of  nuclei  with  the  listed  pair  of  gene  loci  within  1  micron  of  each  other.  Distances  were 
measured  between  the  closest  two  foci  in  each  nucleus. 
doi:1 0.1 371/journal.pone. 0073974 .g005 
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pathways  may  yield  enhanced  tumor  therapy  [40].  There  is  also 
interplay  between  IGFBP3  and  EGFR  in  cancer  cells.  The  initial 
reaction  of  ER  positive  T47D  breast  cancer  cells  to  IGFBP3  is 
inhibitory,  yet  prolonged  expression  of  IGFBP3  cDNA  stimulates 
growth.  Chronic  exposure  of  cells  to  IGFBP3  over  many  passages 
in  vitro  also  led  to  an  increase  in  EGFR  protein  levels,  and 
enhanced  the  response  to  EGF  as  demonstrated  by  an  increase  in 
both  phosphorylated  EGFR  and  DNA  synthesis.  Furthermore, 
xenograft  tumors  in  mice  that  expressed  IGFBP3  showed 
enhanced  growth  and  increased  levels  of  EGFR  [41],  Conversely, 
overexpression  of  EGFR  in  primary  keratinocytes  resulted  in  4.4 
fold  induction  of  IGFBP3  [42].  Our  4C  seq  and  3D  FISH  data 
indicate  IGFBP3  and  EGFR,  separated  by  9  Mb,  are  often  in  close 
spatial  proximity  (Figure  3).  Spatial  proximity  of  loci  residing  on 
the  same  chromosome  is  influenced  to  some  extent  by  their  linear 
separation  in  base  pairs;  it  is  therefore  difficult  to  make 
comparisons  between  studies  of  loci  with  differing  amounts  of 
linear  separation.  Nonetheless,  the  number  of  nuclei  scored  by  3D 
FISH  containing  IGFBP3  and  EGFR  in  close  proximity  can  be 
considered  high  in  our  cell  lines,  especially  HMEC  and  MDA 
MB  231,  where  nearly  all  cells  had  at  least  one  allele  demonstrat 
ing  proximity  within  1  micron.  Importantly,  this  high  interaction 
frequency  was  not  due  solely  to  linear  distance  between  the  genes, 
as  a  large  number  of  interactions  occurred  monoallelically.  This 
can  be  observed  in  HMEC  and  MDA  MB  231  3D  FISH  images 
in  Figure  3A. 

Mounting  evidence  suggests  eukaryotic  transcription  occurs  in 
localized  factories  [43,44],  Transcription  factories  may  exist  to 
provide  coordinated  expression  of  coregulated  genes.  By  uniting 
distant  regions  of  DNA  they  may  also  serve  as  sites  to  share  specific 
or  limiting  regulatory  factors,  and  may  be  required  for  high  levels 
of  transcription.  We  observed  that  in  cell  lines  with  increased 
IGFBP3  rnRNA  there  is  also  an  increase  in  the  interaction 
frequency  of  IGFBP3  with  EGFR.  The  relationship  between 
interaction  frequency  and  expression  is  nonlinear,  and  we  expect 
other  factors  are  modulating  expression  such  as  the  observed 
hypermethylation  of  the  IGFBP3  promoter.  Additional  factors  may 
include  crosstalk  between  IGFBP3  and  EGFR  signaling  pathways 
and  tumor  heterogeneity.  Our  data  suggest  that  IGFBP3  and 
EGFR  may  share  a  common  transcriptional  hub  or  factory,  and 
disruption  of  these  interactions  could  play  a  role  in  tumor 
progression.  Reduction  of  the  IGFBP3  EGFR  interaction  may  not 
only  affect  these  genes,  but  could  result  in  new  long  range 
interactions. 

Cytogenetic  and  molecular  evidence  suggests  spatial  proximity 
influences  recurrent  chromosomal  translocations  [45,46,47,48].  In 
response  to  genotoxic  stress,  oncogenic  translocations  could 
potentially  form  when  DNA  breaks  occur  within  an  interacting 
“hub”.  This  was  demonstrated  in  prostate  cancer  cells  where 
irradiation  led  to  translocations  among  genes  with  hormone 
induced  proximity  [49,50], 

From  our  4C  data,  we  found  that  the  breast  carcinoma 
amplified  sequence  family  of  genes  ( BCAS1 ,  BCAS2,  BCAS3  and 
BCAS4)  interacts  with  IGFBP3.  BCAS1  has  been  found  amplified  in 
primary  breast  tumors  [5 1]  and  associated  with  a  poor  prognosis 
[52].  BCAS2  can  function  as  a  transcriptional  coactivator  of 
estrogen  receptor  [53]  as  well  as  a  negative  regulator  of  P53  [54], 
BCAS3  is  overexpressed  and  associated  with  impaired  response  to 
tamoxifen  in  ER  positive  premenopausal  breast  cancers  [55].  Fine 
mapping  of  breakpoints  in  MCF7  revealed  BCAS3  to  be  located  in 
a  rearrangement  hotspot,  where  7  breakpoints  were  observed 
within  BCAS3  and  19  in  the  surrounding  region  of  the  gene  [31]. 
One  of  the  translocation  partners  of  BCAS3  is  BCAS4,  and  fusion 
transcripts  have  been  detected  in  MCF7  and  HCT116  colon 
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cancer  cells  [3 1 ,56] .  Additionally,  BCAS4  was  found  overexpressed 
in  nine  out  of  1 3  different  breast  cancer  cell  lines  [5  7] .  The  BCAS 
genes  are  frequently  amplified  and  some  have  been  found  to 
translocate  with  each  other  in  breast  tumors,  such  as  BCAS4 
BCAS3  and  BCAS1  BCAS3.  Interestingly,  using  3D  FISH  in 
normal  breast  cells,  we  found  BCAS4  BCAS3  and  BCAS1  BCAS3 
to  interact  with  one  another  as  well  as  IGFBP3,  supporting  the  role 
of  spatial  proximity  in  oncogenic  translocations.  All  pairwise 
interactions,  defined  as  being  equal  to  or  less  than  1  micron, 
occurred  in  4%  or  greater  of  HMEC  nuclei.  This  is  similar  to  the 
association  levels  measured  for  loci  participating  in  interchromo 
somal  interactions  identified  using  the  tethered  chromosome 
conformation  capture  assay  [33] .  It  is  also  similar  to  colocalization 
levels  of  genes  that  occupy  specialized  transcription  factories  in 
mouse  erythroid  nuclei  [32].  We  chose  to  verify  4C  interactions 
with  3D  FISH,  as  the  interacting  regions  can  be  large,  consisting 
of  windows  of  100  or  200  restriction  sites.  3C  would  provide  better 
resolution,  but  doing  so  on  such  large  regions  would  be  quite 
challenging  considering  the  number  of  primers  that  would  be 
needed  since  detecting  an  interaction  between  two  specific 
elements  alone  with  3C  is  not  technically  sound. 

Although  all  interactions  were  present  within  the  population  of 
cells,  there  was  not  a  simultaneous  association  of  all  three  loci.  This 
suggests  the  long  range  interactions  of  the  BCAS  genes  with 
IGFBP3  and  with  one  another  are  dynamic  in  nature,  and 
illustrates  the  heterogeneity  of  chromatin  architecture  within  a  cell 
population.  Chromatin  displays  rapid  constrained  motion  over 
distances  of  ~  1  micron  and  longer  directional  movement  of 
chromatin  domains  has  been  associated  with  gene  expression  [58]. 
We  note  that  3D  FISH  experiments  were  performed  in  cycling 
cells.  Since  this  data  is  limited  to  interphase  cells  we  don’t  expect  it 
to  have  a  major  effect  on  our  results.  As  the  field  progresses  we  will 
likely  see  4D  studies  incorporating  cell  cycle  stages;  there  have 
already  been  correlations  drawn  between  Hi  C  data  and 
replication  timing  [59]. 

It  remains  to  be  seen  what  role  trans  acting  factors  play  in 
mediating  these  long  range  interactions.  In  the  case  of  prostate 
cancer,  the  androgen  receptor  was  shown  to  rapidly  induce  long 
range  interactions  both  in  cis  and  in  trans  following  ligand  binding 
[49,50].  Estrogen  was  also  shown  to  induce  rapid  interchromo 
somal  interactions  among  estrogen  receptor  a  (ERa)  regulated 
genes  [5,60],  In  addition  to  nuclear  receptor  mediated  long  range 
interactions,  increased  expression  of  the  architectural  protein 
SATB 1 ,  which  participates  in  chromatin  loop  formation,  alters  the 
expression  of  over  1000  genes  and  is  associated  with  aggressive 
breast  cancer  [61].  Whatever  the  mechanism  governing  long 
range  interactions,  it  is  likely  to  involve  a  combination  of 
chromatin  remodeling  complexes  and  possibly  nuclear  motor 
proteins.  Along  these  lines,  chromatin  interacting  with  IGFBP3  in 
the  breast  cancer  cell  lines  was  significantly  enriched  for 
methylated  promoters  relative  to  HMEC,  with  MGF7  showing 
the  greatest  fold  increase.  The  IGFBP3  promoter  is  hypermethy 
lated  in  MCF7,  and  this  may  indicate  a  preference  for  chromatin 
domains  with  similar  modifications  to  associate. 

It  is  notable  that  a  large  proportion  of  the  MCF7  translocation 
breakpoints  fall  within  4C  windows.  To  rule  out  artifacts  due  to  an 
interaction  with  a  breakpoint  near  our  bait,  we  checked  for 
breakpoints  proximal  to  IGFBP3,  but  found  none  within  ±5  Mb. 
It  is  important  to  note  that  MCF7  breakpoints  mapped  in  HMEC 
reflect  areas  of  potential  translocations.  It  is  interesting  that  all 
HMEC  4C  windows  containing  translocation  breakpoints  were 
also  present  in  MCF7,  where  breakage  had  occurred.  There  was 
also  an  increase  in  breakpoints  with  both  ends  mapping  to  4C 
windows  in  MCF7  as  compared  to  HMEC.  In  these  instances,  the 
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IGFBP3  long  range  interactions  present  in  normal  breast  cells  were 
maintained  in  the  tumor  cells,  and  additional  interactions  with  the 
reciprocal  breakpoints  were  formed  due  to  rearrangements  in  the 
cancer  cell  line.  This  indicates  a  preference  for  gaining  irons 
interactions  even  after  large  scale  genomic  aberrations  occur. 

Our  study  demonstrates  that  long  range  interactions  of  cancer 
related  loci,  including  EGFR  and  IGFBP3,  are  altered  in  breast 
cancer  cells,  and  these  alterations  are  frequently  associated  with 
epigenetic  changes.  Long  range  interactions  influence  chromo 
somal  translocations,  and  add  an  additional  layer  of  complexity  to 
transcriptional  and  epigenetic  regulation  to  coordinate  gene 
expression.  Therefore,  a  better  understanding  of  aberrant  chro 
matin  interactions  is  needed  to  fully  understand  cancer  pathology. 

Methods 

Cell  Culture 

Primary  human  mammary  epithelial  cells,  HMEC  (Life 
Technologies,  Grand  Island,  NY)  were  cultured  in  HuMEC 
Ready  Medium  (Gibco,  Grand  Island,  NY)  with  1%  penicillin 
streptomycin  (Gibco).  Human  breast  cancer  cell  lines  MCF7  and 
MDA  MB  231  (ATCC,  Manassas,  VA)  were  grown  in  Dulbecco’s 
Modified  Eagle  Medium  (DMEM)  with  high  glucose,  sodium 
pyruvate,  GlutaMAX  media  supplemented  with  10%  fetal  bovine 
serum,  1%  penicillin  streptomycin  (Gibco)  at  37°C  in  5%  CO2. 

Circular  Chromosome  Conformation  Capture  (4C) 
Sequencing  Assay 

4C  was  performed  as  in  Gheldof  et  al.  with  minor  modifications 
[62].  HMEC,  MCF7  and  MDA  MB  231  cells  (2xl07)  were  fixed 
in  2%  formaldehyde  in  fresh  medium  for  10  min  at  room 
temperature,  followed  by  quenching  with  0.125  M  glycine.  Fixed 
cells  were  scraped  from  culture  plates,  spun,  (750  xg  for  10  min), 
and  the  frozen  pellets  were  stored  at  80°C  until  lysis.  Cells  were 
resuspended  in  ice  cold  lysis  buffer  (0.2%  IGEPAL  CA  630, 
10  mM  NaCl,  10  mM  Tris  HC1)  with  SigmaFast  complete 
protease  inhibitor  tablet  (Sigma  Aldrich,  St.  Louis,  MO)  and  lysed 
for  30  min  on  ice.  After  recovery  of  nuclei  by  centrifugation 
(2000  xg  for  5  minutes),  nuclei  were  washed  twice  in  cold  1.2  x 
NEB  buffer  2  and  resuspended  in  the  same  buffer.  Nuclei  were 
incubated  in  the  presence  of  0.3%  SDS  for  1  h  at  37°C  with 
shaking  at  950  rpm,  followed  by  the  addition  of  Triton  X  100  to 
1.8%  for  1  h  at  37°C  with  shaking  at  950  rpm.  Nuclei  were 
digested  with  1 500  U  of  Hindlll  (New  England  Biolabs  Ipswich, 
MA)  overnight  at  37°C  with  shaking  at  950  rpm.  200  pi  of 
digested  nuclei  were  removed  for  assessing  digestion  efficiency  by 
qPCR.  The  restriction  enzyme  was  inactivated  by  the  addition  of 
1.6%  SDS  and  was  incubated  at  65°C  for  20  min.  The  digested 
nuclei  were  diluted  in  7  ml  of  1.1  x  T4  DNA  ligase  buffer  in  the 
presence  of  1%  Triton  X  100  and  incubated  for  1  h  at  37°C. 
Ligation  was  performed  by  adding  800  U  of  T4  DNA  Ligase 
(2,000,000  U/ml;  New  England  Biolabs)  to  the  diluted  mixture  of 
digested  nuclei  and  incubating  in  a  16°C  H20  bath  for  4  hours 
followed  by  a  30  min  incubation  at  room  temperature.  To  reverse 
cross  links,  proteinase  K  was  added  to  a  final  concentration  of 
100  pg/ml  and  incubated  overnight  at  65  °C.  Samples  were 
incubated  with  0.5  pg/ ml  of  RNase  A  at  37°C  for  1  h  and  purified 
by  phenol  chloroform  extraction  followed  by  edianol  precipita 
tion.  DNA  concentration  was  measured  using  a  Qubit®  2.0 
Fluorometer  (Life  Technologies). 

3C  templates  were  digested  with  200  U  MspI  (New  England 
Biolabs)  overnight  at  37°C  with  shaking  at  500  rpm,  followed  by 
heat  inactivation  at  65°C  for  20  min.  Digestion  products  were 
purified  by  phenol  chloroform  extraction  and  ethanol  precipita 


tion.  Ligations  were  performed  in  14  ml  of  1  x  T4  DNA  ligase 
buffer  with  2000  U  of  T4  DNA  ligase.  Circular  ligation  products 
were  purified  by  phenol  chloroform  extraction  and  ethanol 
precipitation  followed  by  clean  up  with  Ampure  beads  (Beckman 
Coulter,  Brea,  CA).  A  total  of  16  inverse  PCR  reactions  with  200 
ng  input  per  4C  template  were  performed  for  each  library  with 
primers  that  included  Illumina  adapter  sequences  and  custom 
barcodes.  All  PCR  reactions  were  performed  with  Expand  Long 
Template  PCR  system  (Roche,  Indianapolis,  IN).  Excess  primers 
were  removed  by  gel  extraction.  HMEC,  MCF7  and  MDA  MB 
231  4C  libraries  were  analyzed  on  a  MultiNA  microchip 
electrophoresis  system  (Shimadzu  Columbia,  MD)  and  mixed  in 
equimolar  amounts.  Multiplex  sequencing  was  performed  on  an 
Illumina  genome  analyzer  IIx  (Illumina,  San  Diego,  CA).  Illumina 
sequencing  data  have  been  submitted  to  the  GEO  database 
accession  number:  GSE49521. 

Mapping  and  Filtering  of  4C  Reads 

We  first  de  multiplexed  the  76  bp  single  end  reads  using 
barcodes  for  each  cell  line.  We  only  retained  the  reads  that 
contained  one  of  the  valid  barcodes  followed  by  the  primer 
sequence  and  a  Hindlll  cleavage  site  and  truncated  them  to  obtain 
the  prey  sequence.  We  mapped  the  truncated  reads  to  the  human 
genome  (UCSC  hgl9)  using  the  short  read  alignment  mode  of 
BWA  (vO.5.9)  with  default  parameter  settings.  We  post  processed 
the  alignment  results  to  extract  the  reads  that  satisfied  the 
following  three  criteria:  (i)  mapped  uniquely  to  one  location  in  die 
reference  genome,  (ii)  mapped  with  an  alignment  quality  score  of 
at  least  30  (which  corresponds  to  1  in  1000  chance  that  mapping  is 
incorrect),  (iii)  mapped  with  an  edit  distance  of  at  most  3.  We 
assigned  the  qualified  reads  to  the  nearest  Hindlll  cleavage  site 
using  their  mapping  coordinates.  We  then  identified  the  restriction 
fragments  interacting  (diose  flanking  the  cleavage  sites  with  a  read 
count  of  at  least  one)  with  the  bait  region.  We  discarded  ±50  kb 
region  around  the  bait  from  further  analysis. 

Statistical  Analysis  of  4C  Data 

We  first  identified  all  the  Hindlll  sites  in  the  genome  (~840  k) 
and  eliminated  the  ones  with  no  MspI  site  within  2  kb  downstream 
of  the  Hindlll  site,  resulting  in  ~470  k  restriction  fragments  for 
downstream  analysis.  In  order  to  avoid  PCR  artifacts,  we 
binarized  the  interactions  counts  as  was  done  previously  in  other 
4C  analysis  pipelines  [63].  This  processing  resulted  in  23,559, 
19,876  and  16,387  restriction  fragments  that  interact  with  the  bait 
region  for  HMEC,  MCF7  and  MDA  MB  231  cell  lines,  respec 
tively.  In  order  to  account  for  the  difference  in  the  number  of 
interacting  fragments  between  cell  types  and  the  effect  of  genomic 
distance  on  the  intrachromosomal  interaction  probability,  we 
applied  a  statistical  significance  assignment  procedure  similar  to 
the  one  described  in  Splinter  et  al  [63].  We  first  separated 
interactions  into  four  groups  depending  on  the  linear  distance  of 
interacting  loci  to  the  bait. 

1 .  Bait  region  interactions:  Intrachromosomal  interactions  below 
50  kb  distance  to  the  bait  and  are  excluded  front  our  analysis. 

2.  Proximal  intrachromosomal  interactions:  Intrachromosomal 
interactions  between  50  kb  to  2  Mb  distance  from  the  bait. 

3.  Long  range  intrachromosomal  interactions:  Intrachromosomal 
interactions  above  2  Mb  distance  from  the  bait. 

4.  Interchromosomal  interactions:  Interactions  that  are  on 
chromosomes  other  than  the  bait  chromosome  (chr  7). 

We  then  combined  multiple  consecutive  restriction  fragments 
with  window  sizes  that  are  appropriate  for  each  of  the  groups 
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above.  This  step  is  necessary  due  to  limited  resolution  of  current 
4C  methods  and  enables  us  to  assign  statistical  confidences  for 
interactions  at  varying  resolutions.  We  used  window  sizes  of  10,  20 
and  40  for  group  2;  50,  100  and  200  for  group  3;  100,  200  and  400 
for  group  4  interactions.  For  each  group  of  interactions,  we 
counted  the  number  of  interacting  fragments  within  a  window  for 
each  window  size.  We  then  generated  a  background  distribution 
by  randomly  shuffling  the  interacting  and  non  interacting 
fragments  for  each  group  and  repeating  this  randomization  100 
times.  For  intrachromosomal  interactions,  we  take  into  account 
the  linear  distance  of  each  region  to  the  bait  when  generating  the 
background.  For  interchromosomal  interactions,  we  generated  the 
background  by  aggregating  all  chromosomes  (unlike  Splinter  et  al 
[63]  who  generate  one  background  per  each  chromosome)  to 
preserve  the  information  from  possible  chromosome  territory 
associations  that  include  chromosome  7.  Similar  to  Splinter  et  al, 
[63]  we  calculated  the  z  value  threshold  at  which  the  false 
discovery  rate  (FDR)  is  0.01  to  determine  the  windows  that 
significantly  interact  with  the  4C  bait  (4C  enriched  windows/ 
regions).  To  determine  cell  line  specific  4C  enriched  regions,  at  a 
given  window  size,  we  simply  take  the  list  of  regions  that  are 
deemed  interacting  at  FDR  0.01  in  one  cell  line  and  not  in  the 
other. 

3D-fluorescence  in  situ  Hybridization 

Cells  grown  on  12  mm  coverslips  were  fixed  in  4%  parafor 
maldehyde  (PFA)  for  10  min,  made  permeable  with  0.5%  Triton 
X  100  for  5  min,  incubated  in  20%  glycerol/ 1  x  PBS  for  at  least 
40  min,  freeze  diawed  in  liquid  nitrogen  four  times,  and  treated 
with  0. 1  N  HC1  for  5  min.  Cells  were  then  treated  with  RNase  A 
for  45  min  at  37°C.  Coverslips  were  then  stored  in  50% 
formamide/2x  SSC  at  4°C  until  denaturation  at  75°C  for 
7  min  in  70%  formamide/2x  SSC  followed  by  immersion  in  ice 
cold  50%  formamide/2  x  SSC. 

BAC  probes:  RP11  89E8,  RP11  108317,  RP11  55E1,  RP11 
1115J10,  RP11  705A3,  RP11  805G4,  RP11185P21,  RP11 
1058F18,  RP1 1  937E18,  RP115P14  (Roswell  Park  Cancer 
Institute,  Buffalo,  NY)  were  labeled  with  dinitrophenol  1 1  dUTP 
(PerkinElmer,  Waltham,  MA),  Alexa488  dUTP  or  Alexa594 
dUTP  (Life  Technologies)  by  nick  translation  (Roche).  Probes  in 
50%  formamide/2  x  SSC/ 10%  dextran  sulfate  were  denatured 
for  8  10  min  at  75°C.  Probes  were  cooled  on  ice  and  hybridized 
for  36  48  h  at  37°C,  followed  by  three  post  hybridization  washes 
with  50%  formamide/2  x  SSC/0.05%  Tween  20,  2x  SSC/0.05% 
Tween  20,  and  1  x  SSC  for  30  min  each  at  37°C.  Detection  of 
BAC  probes  was  performed  by  reaction  with  rabbit  anti  DNP  (Life 
Technologies)  diluted  (1:1000)  and  secondary  goat  anti  rabbit 
(1:200)  conjugated  to  Alexa594  or  Alexa647  (Life  Technologies). 
Following  labeling,  indirect  immunofluorescence  was  detected 
with  Chroma  filter  sets  using  an  Olympus  BX41  upright 
microscope  (100  x  UPLSAPO,  oil,  1.4  NA)  equipped  with 
motorized  £  axis  controller  (Prior  Scientific,  Rockland,  MA)  and 
Slidebook  5.0  software  (Intelligent  Imaging  Innovations,  Denver, 
CO).  Optical  sections  of  0.5  pm  were  collected,  deconvolved  using 
a  NoNeighbor  algorithm  operating  within  Slidebook  5.0,  and  3D 
distances  were  measured  from  the  center  of  each  FISH  focus. 
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CpG  Methylation  by  Bisulfite  Pyrosequencing 

Genomic  DNA  front  HMEC,  MCF7  and  MDA  MB  23 1  were 
treated  with  bisulfite  using  the  EZ  DNA  Methylation  kit  (ZYMO 
Research,  Irvine,  CA).  The  locus  of  interest  was  amplified  using  a 
combination  of  foiward  and  biotinylated  reverse  primers  (see 
Table  S3  for  primer  sequences).  40  ng  bisulfite  treated  DNA  was 
used  for  each  25  pi  PCR  reaction  with  2G  Robust  polymerase 
(KAPA  Biosystems,  Woburn,  MA)  following  KAPA’s  recorn 
mended  cycling  conditions.  Pyrosequencing  of  the  resulting 
amplicons  was  performed  at  the  PAN  facility,  Stanford  University 
using  a  Qiagen  Pyromark  instrument.  Assays  were  designed  using 
Pyromark  Assay  Design  software  (Qiagen,  Valencia,  CA).  The 
methylation  indices  were  calculated  as  the  average  percent 
methylation  of  successive  CpG  dinucleotides  between  the  primers. 

RNA  Extraction  and  Quantitative  RT-PCR 

RNA  was  extracted  from  HMEC,  MCF7  and  MDA  MB  231 
cells  using  the  RNeasy  Mini  Kit  and  QIAshredder  mini  column 
(Qiagen)  according  to  the  manufacturer’s  instructions.  DNA  was 
digested  on  a  column  using  RNase  free  DNase  set  (Qiagen).  1  pg 
of  RNA  was  reverse  transcribed  with  Superscript  III  first  strand 
synthesis  supermix  for  qRT  PCR  (Life  Technologies).  qRT  PCR 
was  performed  using  KAPA  SYBR  Fast  ABI  PRISM  qPCR  mix 
(KAPA)  on  an  ABI  7900HT  Real  Time  PCR  System  (Applied 
Biosystems).  Primers  were  purchased  from  RealTimePrimers.com. 
The  most  stable  reference  genes  (ACTB  and  GAPD)  were  selected 
from  a  set  of  10  using  geNorm  software  [64],  Reaction  efficiency 
for  each  primer  set  was  calculated  using  Real  time  PCR  Miner 
[65]  and  fold  change  of  target  genes  relative  to  HMEC  was 
calculated  using  the  Pfafll  method  [66]. 
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Figure  SI  IGFBP3  4C-Seq  Bait.  The  bait  sequence,  top  (red 
bar)  flanks  a  Hindlll  site  upstream  of  IGFBP3  in  a  region  classified 
as  a  strong  enhancer  (orange  bar).  Image  generated  with  UCSC 
genome  browser,  hgl9. 
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Figure  S2  Distribution  of  the  significant  200  restriction 
site  interchromosomal  windows  for  HMEC,  MCF7  and 

MDA-MB-231.  Percent  of  total  interactions  per  cell  line  are 
plotted  for  each  chromosome. 

(JPG) 

Table  SI  Sequence  read  distribution  (not  corrected  for  local 
interactions). 
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Table  S2  Distribution  of  methylated  promoter  CpG  nucleotides 
relative  to  HMEC. 
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Table  S3  Methylation  assay  primer  sequences. 
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Abstract  The  insulin  like  growth  factor  type  I  receptor  ( IGF1R )  is  frequently  dysregulated  in 
breast  cancers,  yet  the  molecular  mechanisms  are  unknown.  A  novel  intragenic  long  non  cod 
ing  RNA  (IncRNA)  IRAIN  within  the  IGF1R  locus  has  been  recently  identified  in  haemato 
poietic  malignancies  using  RNA  guided  chromatin  conformation  capture  (R3C).  In  breast 
cancer  tissues,  we  found  that  IRAIN  IncRNA  was  transcribed  from  an  intronic  promoter  in 
an  antisense  direction  as  compared  to  the  IGFIR  coding  mRNA.  Unlike  the  IGF1R  coding 
RNA,  this  non  coding  RNA  was  imprinted,  with  monoallelic  expression  from  the  paternal 
allele.  In  breast  cancer  tissues  that  were  informative  for  single  nucleotide  polymorphism 
(SNP)  rs8034564,  there  was  an  imbalanced  expression  of  the  two  parental  alleles,  where  the 
‘G’  genotype  was  favorably  imprinted  over  the  ‘A’  genotype.  In  breast  cancer  patients,  IRAIN 
was  aberrantly  imprinted  in  both  tumours  and  peripheral  blood  leucocytes,  exhibiting  a  pat 
tern  of  allele  switch:  the  allele  expressed  in  normal  tissues  was  inactivated  and  the  normally 
imprinted  allele  was  expressed.  Epigenetic  analysis  revealed  that  there  was  extensive  DNA 
demethylation  of  CpG  islands  in  the  gene  promoter.  These  data  identify  IRAIN  IncRNA  as 
a  novel  imprinted  gene  that  is  aberrantly  regulated  in  breast  cancer. 
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1.  Introduction 

Despite  recent  advances  in  molecular  therapeutics, 
breast  cancer  remains  a  highly  lethal  malignancy  world¬ 
wide  [1],  Anti-human  epidermal  growth  factor  receptor 
2  (HER2)  antibody  therapy  using  Herceptin  has  been 
successful  in  the  treatment  of  HER2-positive  early  stage 
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and  metastatic  breast  cancers  [2,3].  However,  resistance 
to  Herceptin  therapy  has  become  an  obstacle  for  treat¬ 
ment  of  HER2-positive  breast  cancer  patients  [4,5], 
The  activation  of  alternative  growth  factor  pathways, 
particularly  via  the  insulin-like  growth  factor  1  receptor 
(IGF1R),  represents  a  common  feature  of  Herceptin- 
refractory  cells  [6], 

IGF1R  is  one  of  the  most  abundantly  phosphorylated 
receptor  tyrosine  kinases  in  tumours  [7  9],  The  insulin¬ 
like  growth  factor  system,  including  the  type  I  IGF 
receptor  IGF1R  and  the  mitogenic  ligands  IGF-1  and 
IGF-II,  is  frequently  dysregulated  in  breast  cancer  and 
is  known  to  contribute  to  disease  progression  and 
metastasis  [10  14],  IGF-I  and  IGF-II  promote  cell 
growth  and  survival  via  the  IGF1R  receptor-mediated 
signal  transduction  through  intracellular  tyrosine  kinase 
linked  to  the  phosphatidyl-inositol-3  kinase  (PI3K)-Akt- 
mammalian  target  of  rapamycin  (mTOR)  pathway. 
Overexpression  of  IGF1R  activates  the  PI3-K  and  mito¬ 
gen-activated  protein  kinases  (MAPK)  signal  cascades, 
resulting  in  cell  proliferation  and  resistance  to  chemo¬ 
therapeutic  agents,  radiation,  and  targeted  therapies 
using  Tamoxifen  and  Herceptin  [15  17],  Therapeutic 
agents  targeting  IGF1R  are  currently  in  clinical  develop¬ 
ment  [10  14,18  23],  including  those  that  inhibit  the 
IGF1R  tyrosine  kinase  using  monoclonal  antibodies 
and  small  molecules  [24],  However,  the  clinical  develop¬ 
ment  of  various  IGF1R  inhibitors  has  been  put  on  hold 
due  to  lack  of  sufficient  clinical  efficacy.  Thus,  the  regu¬ 
lation  of  this  pathway  needs  to  be  further  defined  to  aid 
in  the  development  of  next  generation  regimens. 

Currently,  the  molecular  mechanisms  underlying  the 
dysregulation  of  the  IGF1R  pathway  in  tumours  remain 
unknown.  Using  a  recently-developed  R3C  (RN  A-guided 
Chromatin  Conformation  Capture)  technique  [25],  we 
recently  identified  a  novel  long  non-coding  RNA 
(IncRNA)  IRAIN  within  the  IGF1R  locus  [26],  IRAIN  is 
transcribed  from  an  intragenic  promoter  located  in  the 
first  intron  of  IGF1R.  IRAIN  IncRNA  is  transcribed  in 
an  antisense  orientation  compared  with  the  IGF1R  gene, 
and  it  is  expressed  exclusively  from  the  paternal  allele, 
with  the  maternal  allele  being  silenced.  Interestingly,  this 
IncRNA  interacts  with  chromatin  DNA  and  is  involved 
in  the  formation  of  an  intrachromosomal  enhancer/pro¬ 
moter  loop.  In  addition,  IRAIN  was  downregulated  in 
leukaemia  cell  lines  and  in  leucocytes  from  patients  with 
high-risk  AML  [26],  These  data  suggested  that  IRAIN 
might  play  a  role  in  the  dysregulation  of  the  IGF  pathway 
in  haematopoietic  malignancies. 

However,  the  function  of  this  non-coding  RNA  in 
other  malignancies  remains  to  be  explored.  The  IGF1R 
pathway  is  frequently  dysregulated  in  breast  cancer.  It 
is  unclear  if  IRAIN  IncRNA  is  aberrantly  imprinted  in 
breast  cancer  patients.  In  this  communication,  we  char¬ 
acterise  the  allelic  expression  of  IRAIN  IncRNA  in  a 
cohort  of  breast  cancer  samples. 


2.  Materials  and  methods 

2.1.  Breast  cancer  cell  lines  and  tissues 

Breast  cancer  cell  lines  (MCF7  and  MDA-MB-231) 
used  in  this  study  were  purchased  from  ATCC.  Cells 
were  grown  in  RP1640  Media,  supplemented  with  10% 
foetal  bovine  serum  (FBS),  100  U/ml  penicillin  and 
100  pg/ml  streptomycin. 

Breast  tumour  specimens  were  collected  from  74 
female  patients  with  invasive  breast  cancer  who  were 
treated  at  The  First  Hospital  of  Jilin  University  between 
2007  and  2010.  All  tumour  samples  were  obtained  from 
patients  with  invasive  ductal  carcinomas  (IDC)  ( n  74) 
(Table  SI).  Normal  breast  tissues  (n  9)  were  collected 
as  control  samples  from  the  patients  who  either  under¬ 
went  prophylactic  mastectomy  (n  3)  or  in  whom  nor¬ 
mal  breast  tissue  was  removed  at  a  site  distant  from  the 
primary  tumour  (n  6).  The  protocol  was  approved  by 
the  Human  Medical  Ethical  Review  Committee  from 
Jilin  University  First  Hospital  and  informed  consent 
was  obtained  from  each  breast  cancer  patient  and  nor¬ 
mal  subject. 

Samples  were  snap  frozen  in  liquid  nitrogen  at  the 
time  of  the  pre-therapeutic  biopsy  or  surgical  treatment 
and  were  stored  at  80  °C  for  total  RNA  and  genomic 
DNA  extraction.  The  pathological  diagnosis  was  made 
in  accordance  with  the  histological  classification  of 
tumours  developed  by  the  World  Health  Organisation. 
Tumour  stage  was  defined  according  to  American  Joint 
Committee  on  Cancer/International  Union  Against 
Cancer  tumour,  node,  metastasis  (TNM)  classification 
system.  Tumours  were  histologically  graded  according 
to  the  Elston  and  Ellis  method.  Molecular  markers, 
including  the  oestrogen  receptor  (ER),  the  progesterone 
receptor  (PR),  human  epidermal  growth  factor  receptor 
2  (HER2)  and  the  mitotic  index  Ki67,  were  examined  by 
using  immunohistochemical  (IHC)  methods.  Patients 
with  Her2/neu2+  were  tested  for  gene  amplification 
using  fluorescence  in  situ  hybridisation  (FISH)  for  vali¬ 
dation.  Breast  cancer  molecular  subtype  was  defined  by 
IHC  receptor  status  of  breast  cancer  according  to  St 
Gallen  International  Breast  Cancer  Conference  (2013). 
We  divided  the  patients  into  three  groups:  (1)  Triple 
negative  breast  cancer  (TNB,  ER-,  PR-,  HER2  ),  (2) 
HER2+  (ER-,  PR-,  and  HER2+),  (3)  luminal  [‘Luminal 
A-like’:  ER+  and  PR+  (5=20%),  HER2  ,  Ki67  <  14%; 
‘Luminal  B-like  (HER2  negative)’:  ER+,  HER2  ,  and 
at  least  one  of:  Ki-67  5  14%,  PR-,  PR+  (<20%);  ‘Lumi¬ 
nal  B-like  (HER2  positive)’:  ER+,  HER2+,  Any  Ki-67, 
Any  PR)].  After  pathologic  diagnosis,  patients  were 
treated  according  to  standard  clinical  protocols.  Clinical 
data  such  as  date  of  birth,  sex,  date  of  surgery  were 
extracted  from  the  computerised  clinical  database. 

Peripheral  blood  leucocytes  (PBL)  collected  from 
breast  cancer  patients  were  isolated  by  Ficoll-Hypaque 
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(Sigma,  MO)  centrifugation  and  then  cryopreserved  for 
DNA  and  RNA  analyses. 

2.2.  Reverse  transcription-polymerase  chain  reaction 
(PCR)  analysis 

As  previously  described  [25,27],  total  RNA  was 
extracted  from  tissues  by  TRI-REAGENT  (Sigma, 
CA)  and  cDNA  was  synthesised  with  RNA  reverse 
transcriptase.  Briefly,  1  |ig  of  total  RNA  was  used,  and 
PCR  was  carried  out  under  liquid  wax  in  a  6  pi  reaction 
mixture  containing  2  pi  of  3  x  Klen-Tag  I  Mix,  2  pi 
cDNA  and  1  pi  of  each  2.5  pM  primer.  After  incubation 
at  95  °C  for  2  min,  IRAIN  cDNA  was  amplified  by  32 
cycles  of  95  °C  for  30  s,  65  °C  for  30  s  of  annealing 
and  72  °C  for  35  s  of  extension,  and  finally  with  exten¬ 
sion  at  72  °C  for  5  min. 

For  real-time  quantitative  polymerase  chain  reaction 
(qPCR),  cDNA  samples  were  amplified  using  CFX96™ 
real-time  system  (BIO-RAD)  by  SYBR  PrimeScript™ 
RT-PCR  Kit  (Takara,  Japan).  The  mRNA  expression 
level  of  IRAIN  and  IGF1R  was  quantitated  by  normal¬ 
ising  with  p-actin  (housekeeping  gene)  as  previously 
described  [27,28].  PCR  primers  used  for  qPCR  and 
RT-CPR  are  listed  in  Table  S2. 

2.3.  Gene  strand-specific  RT-PCR 

The  orientation  of  IRAIN  was  mapped  with  a  strand- 
specific  PCR  (SSRT)  assay  [26,27],  Total  RNA  was 
extracted  from  tissues  by  TRI-REAGENT  (Sigma, 
MA).  Total  RNA  (400  ng)  was  reverse  transcribed  with 
the  IRAIN  5'-  or  3'-primers  using  Maxima  Reverse 
Transcriptase  (Thermo  Fisher  Scientific,  CA)  at  60  °C 
for  50  min,  followed  by  85  °C  for  5  min  to  inactivate 
the  transcriptase.  After  10-fold  dilution,  PCR  was  car¬ 
ried  out  under  liquid  wax  in  a  6  pi  reaction  containing 
2  pi  of  3  x  Klen-Tog  I  Mix,  2  pi  cDNA  and  1  pi  of  each 
2.5  pM  downstream  PCR  primer  set.  After  initial  dena¬ 
turing  at  95  °C  for  2  min,  IRAIN  cDNA  was  amplified 
by  32  cycles  at  95  °C  for  30  s,  65  °C  for  30  s  of  annealing 
and  72  °C  for  35s  of  extension,  followed  by  incubation 
at  72  °C  for  5  min.  PCR  primers  used  for  strand-specific 
PCR  are  listed  in  Table  S2. 

2.4.  Allelic  expression 

Quantitation  of  allelic  expression  requires  the  pres¬ 
ence  of  heterozygous  single  nucleotide  polymorphisms 
(SNPs)  to  distinguish  the  two  parental  alleles.  We 
extracted  genomic  DNAs  from  breast  cancer  samples 
and  screened  for  heterozygosity  of  SNPs  in  IRAIN 
IncRNA  [26].  Only  those  SNP-informative  breast  cancer 
samples  were  used  for  IRAIN  imprinting  analysis.  How¬ 
ever,  for  imprinting  assessment,  the  data  of  allelic  distri¬ 
bution  of  the  parents  were  needed  to  track  which 
parental  allele  was  expressed.  In  these  studies,  only  those 


cases  with  parental  information  available  were  included. 
To  examine  differential  allelic  expression  of  IRAIN 
IncRNA  between  tumours  and  peripheral  blood  leuco¬ 
cytes  (PBL),  only  those  informative  cohort  cases  with 
available  blood  samples  were  included  in  the  study. 

Total  RNA  extraction  and  cDNA  synthesis  were  per¬ 
formed  as  previously  described  [29,30].  Allelic  expression 
of  IRAIN  was  examined  using  the  same  program  as  in  the 
RT-PCR  but  using  primers  specific  for  polymorphic 
restriction  enzymes.  Allelic  expression  of  IRAIN  was 
assessed  by  polymorphic  restriction  enzymes  Ndel.  In 
some  cases,  DNA  sequencing  of  genomic  DNA  and 
cDNA  PCR  products  was  used  to  determine  allelic 
expression  of  IRAIN.  PCR  primers  used  to  assess  allelic 
expression  covering  the  Ndel  polymorphic  site  were 
JH891  and  JH892,  and  primers  for  allelic  sequencing 
using  SNP  rs8034564  were  JH248  and  JH781  (listed  in 
Table  S2). 

2.5.  DNA  methylation  analysis 

Genomic  DNA  was  extracted  from  cells  and  tumours 
with  Perfect  gDNA  kit  (Eppendorf,  NY).  Genomic 
DNA  ( 1  pg)  was  used  for  bisulphite  conversion  with 
EZ  DNA  Methylation-Gold™  Kit  (ZYMO  Research, 
CA)  according  to  the  manufacturer’s  instructions. 
PCR  reactions  were  performed  using  Kantaql  DNA 
polymerase  (Ab  Peptides,  MO).  Bisulphite-sequencing 
PCR  (BSP)  was  used  to  analyse  DNA  methylation  sta¬ 
tus.  PCR  conditions  were  97  °C  for  10  min  followed 
by  35  cycles  of  96  °C  for  20  s,  64  °C  for  30  s  of  anneal¬ 
ing,  72  °C  for  30  s  of  extension,  and  completing  the  reac¬ 
tion  at  72  °C  for  10  min.  The  primers  used  for  assessing 
DNA  methylation  were  JH852  and  JH855  (Table  S2) 
[26],  PCR  products  were  separated  by  gel  electrophore¬ 
sis  and  purified  with  Axygen  DNA  Gel  Extraction  kit 
(Axygen,  CA).  The  PCR  products  were  cloned  into 
CloneJET  vector  using  PCR  Cloning  Kit  (Thermo  Sci¬ 
entific  #  K1231,  MA),  and  sequenced  for  analysis  of 
CpG  methylation. 

2.6.  Statistical  analysis 

All  experiments  were  performed  in  triplicate,  and  the 
data  are  expressed  as  mean  ±  SD.  The  data  were  ana¬ 
lysed  by  one-way  analysis  of  variance,  and  results  were 
considered  statistically  significant  at  P  <  0.05. 

3.  Results 

3.1.  Ubiquitous  expression  of  IGF1R  intragenic  non¬ 
coding  RNA  in  breast  cancers 

In  studying  the  tumour-specific  dysregulation  of 
IGF1R,  we  recently  identified  a  novel  5366  bp  IGF1R- 
intragenic  long  non-coding  RNA  (IRAIN)  that  is 
associated  with  haematopoietic  malignancies  [26],  To 
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determine  if  this  non-coding  RNA  is  aberrantly  regu¬ 
lated  in  breast  cancer,  we  first  used  RT-PCR  to  deter¬ 
mine  the  presence  of  this  non-coding  RNA  in  breast 
cancer  tissues.  We  found  that  IRAIN  IncRNA  was  ubiq¬ 
uitously  expressed  at  various  levels  in  the  breast  cancer 
samples  (Fig.  1  A). 

We  then  grouped  the  breast  cancer  patients  into  three 
groups:  (1)  Triple-negative  breast  cancer,  (2)  HER2+ 
and  (3)  Luminal  (luminal  A-like,  luminal  Bl-like,  lumi¬ 
nal  B2-like)  as  described  in  Section  2.  As  shown  in 
Fig.  IB,  IRAIN  IncRNA  was  downregulated  in  both 
TNB  and  HER2+  groups  ( P  <  0.05). 

3.2.  IRAIN  is  transcribed  antisense  to  IGF1R  in  breast 
cancers 

We  then  used  a  strand-specific  RT-PCR  (SSRT) 
method  to  map  the  orientation  of  gene  transcription. 
SSRT  cDNA  was  synthesised  by  Thermo-stable  reverse 
transcriptase  utilising  a  5'-specific  oligonucleotide  or  a 
3'-specific  oligonucleotide,  respectively.  After  SSRT,  a 
pair  of  downstream  PCR  primers  was  used  to  amplify 
the  strand-specific  cDNA  (Fig.  2A). 

As  seen  in  Fig.  2B,  IRAIN  RNA  was  detected  only 
when  cDNA  was  synthesised  using  5'-oligonucleotides 
(#513,  #400)  (lanes  1,  4,  7,  10).  No  PCR  products  were 
amplified  when  the  3'  oligonucleotides  were  used  (#514, 
#401,  lanes  2,  5,  8,  11)  or  in  the  RT-minus  controls 
(lanes  3,  6,  9,  12),  indicating  that  IRAIN  was  transcribed 
in  the  antisense  direction  as  compared  with  the  IGF1R 
coding  RNA. 

A 

200  bp- 


3.3.  Monoallelic  expression  of  IRAIN  in  breast  cancer 
tissues 

In  the  mouse,  the  gene  transcribing  the  Type  2  IGF 
receptor  ( Igf2r )  is  associated  with  an  IncRNA  Aim  that 
is  transcribed  antisense  to  Igf2r.  These  transcripts  are 
reciprocally  imprinted,  with  Aim  transcribed  from  the 
paternal  allele  only.  The  transcription  of  the  antisense 
IncRNA  Aim  regulates  in  cis  the  allelic  expression  of 
the  Igf2r  coding  RNA  [31  34],  In  leukaemia  cells,  we 
showed  that  IRAIN  was  expressed  solely  from  the  pater¬ 
nal  allele  [26].  To  learn  if  IRAIN  uses  a  similar  epigenetic 
mechanism  to  regulate  genes  locally  in  breast  cancers,  we 
examined  if  IRAIN  IncRNA  was  monoallelically 
expressed  in  the  MCF7  breast  cancer  cell  line,  which  is 
heterozygous  for  the  polymorphic  Ndel  restriction  site. 
Two  alleles,  termed  ‘A’  and  ‘G’,  were  detected  in  genomic 
DNA  (Fig.  3A,  lanes  2  3).  In  cDNA  samples,  however, 
only  the  ‘A'  allele  was  detected  (lanes  5,  6).  The  other 
parental  allele  (G),  in  contrast,  was  totally  suppressed. 
These  data  indicate  that  IRAIN  IncRN  A  is  monoallelical¬ 
ly  transcribed  in  the  MCF7  breast  cancer  cell  line. 

We  then  examined  the  allelic  expression  of  IRAIN 
IncRNA  in  breast  cancer  tissue  samples  using  SNP 
rs8034564  to  distinguish  the  two  parental  alleles.  As  this 
SNP  does  not  contain  a  restriction  enzyme  site,  PCR 
sequencing  was  used  to  determine  the  allelic  expression 
of  IRAIN.  In  three  breast  cancer  tissues  that  were  heter¬ 
ogeneous  for  this  SNP,  both  the  ‘A’  and  ‘G’  alleles  were 
observed  in  genomic  DNAs  (gDNA).  However,  in  all 
cDNA  samples  tested,  only  a  single  parental  allele  (A) 
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Fig.  1 .  Downregulation  of  IRAIN  long  non  coding  RNA  (IncRNA)  in  breast  cancer.  (A)  Ubiquitous  expression  of  IRAIN  in  breast  cancer  tissues. 
Expression  of  IRAIN  IncRNA  was  analysed  by  reverse  transcription  polymerase  chain  reaction  (RT  PCR).  (3  Actin  was  used  as  the  internal 
control.  Lanes  1  4:  HER2+;  lanes  5  7:  Luminal;  lanes  8  10:  TNB.  (B)  Dysregulation  of  IRAIN  IncRNA  in  breast  cancer  subtypes.  TNB,  Triple 
negative  breast  cancer  (TNB,  ER  ,  PR  ,  F1ER2  ):  HER2+:  (ER  ,  PR  ,  and  F(ER2+);  Luminal:  [‘Luminal  A  like":  ER+  and  PR+  (>20%),  HER2  , 
Ki67  <14%;  ‘Luminal  B  like  (HER2  negative)’:  ER+,  HER2  ,  and  at  least  one  of:  Ki  67  >14%,  PR  .  PR+  (<20%);  ‘Luminal  B  like  (FIER2 
positive)’:  ER+,  HER2+,  Any  Ki  67,  Any  PR)].  * p  <  0.05  between  the  groups. 
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B.  SSRT-PCR 
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Fig.  2.  IRAIN  is  an  antisense  long  non  coding  RNA  (IncRNA).  (A)  Diagram  of  the  IRAIN/IGFI R  locus.  pIRAIN:  IRAIN  IncRNA  promoter  that 
is  transcribed  in  antisense;  pIGFIR:  IGF1R  coding  RNA  promoter  that  is  transcribed  in  sense.  Horizontal  arrows:  SSRT  PCR  primers  used  to 
map  the  orientation  of  IRAIN  IncRNA.  (B)  IRAIN  IncRNA  is  an  antisense  IncRNA.  The  strand  specific  cDNAs  were  synthesised  using  either  the 
5'  or  the  3'  oligonucleotides.  A  pair  of  polymerase  chain  reaction  (PCR)  primers  located  between  two  cDNA  oligonucleotides  was  then  used  to 
determine  the  transcription  orientation  of  the  IRAIN  IncRNA.  M:  100  bp  marker;  input:  total  RNA  collected  before  SSRT  PCR. 
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Fig.  3.  Monoallelic  expression  of  IRAIN  long  non  coding  RNA  (IncRNA).  (A)  Monoallelic  expression  of  IRAIN  IncRNA  in  MCF7  breast  cancer 
cell  line.  gDNA:  heterozygous  genomic  DNA.  Note  only  the  single  ‘A’  allele  of  IRAIN  IncRNA  was  detected  in  cDNA  samples.  (B)  Monoallelic 
expression  of  IRAIN  IncRNA  in  breast  cancer  tissues.  In  breast  cancers  that  are  heterogeneous  for  the  IRAIN  polymorphic  site  in  genomic  DNA, 
only  the  ‘A’  allele  was  expressed.  (C)  Parental  imprinting  of  IRAIN  IncRNA.  Genomic  DNA  and  cDNA  from  peripheral  blood  leuoocytcs  (PBL) 
were  amplified  and  PCR  products  were  sequenced  for  the  A/G  alleles.  Note  the  IRAIN  IncRNA  was  expressed  from  the  paternal  allele. 
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Fig  3.  ( continued) 


was  detected  (Fig.  3B),  suggesting  that  IRAIN  is  mono- 
allelically  expressed  in  breast  cancer. 

In  examining  allelic  expression,  it  was  interesting  to 
note  the  expression  of  the  ‘A’  allele  was  favored  over 
the  ‘G’  allele.  In  18  breast  cancers  that  were  heterozy¬ 
gous  for  the  polymorphic  SNP,  16  tumour  cDNAs 
expressed  the  ‘A’  allele  alone  (Table  1). 

3.4.  IRAIN  is  imprinted  in  breast  cancer  tissues 

To  determine  if  IRAIN  is  imprinted,  we  tracked  the 
expression  from  the  paternal  or  maternal  allele  using 

Table  1 


The  favored  ‘A’  monoallelic  expression  of  IRAIN  in  breast  cancer 
tissues. 


Case 

DNA  genotype 

cDNA  allelic 
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39 
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43 
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+ 

Total 

2  (11%) 

16  (89%) 

peripheral  blood  leucocytes  from  two  patients  whose 
parents  had  also  donated  blood  samples.  In  Case  #36, 
the  father  was  heterozygous  for  the  A  and  G  alleles 
while  the  mother  was  homozygous  for  the  A  allele. 
The  patient  was  informative  at  the  polymorphic  site, 
carrying  both  the  A  and  G  alleles  in  the  genomic 
DNA.  In  the  cDNA  sample,  however,  we  detected  the 
expression  of  IRAIN  IncRNA  only  from  the  G  allele 
that  was  inherited  from  the  father  (Fig.  3C,  left  panel), 
demonstrating  that  IRAIN  is  paternally  expressed  and 
maternally  suppressed.  We  also  confirmed  the  paternal 
expression  in  Case  #32.  The  patient  was  heterozygous 
for  the  A  and  G  alleles.  In  the  cDNA  sample,  only  the 
paternal  A  allele  was  expressed  (right  panel).  Thus, 
IRAIN  is  maternally  imprinted,  in  agreement  with  our 
previous  finding  in  leukaemia  samples  [26], 

3.5.  DNA  methylation  in  the  IRAIN  promoter 

The  IRAIN  promoter  is  very  rich  in  CpG  dinucleo¬ 
tides.  In  peripheral  blood  leucocytes,  the  promoter 
CpG  islands  were  semi-methylated  [26],  We  analysed 
the  status  of  DNA  methylation  in  the  IRAIN  promoter 
in  our  breast  cancer  specimens  and  cell  lines  (Fig.  4A). 
Using  bisulphite  sequencing,  we  found  that  the  IRAIN 
promoter  is  totally  unmethylated  in  two  breast  cancer 
cell  lines  (MCF-7,  MDA-MB-231)  (Fig.  4B). 

We  also  examined  DNA  methylation  in  three  breast 
cancer  tissues  that  show  monoallelic  expression.  In  the 
breast  cancer  specimens,  we  observed  a  hemi-methyla- 
tion  pattern  in  the  IRAIN  promoter  in  two  breast  cancer 
samples  (Cases  #5,  #9)  (Fig.  4C).  However,  in  Case  #6, 
the  IRAIN  promoter  was  almost  totally  unmethylated, 
as  was  seen  in  the  two  breast  cancer  cell  lines.  Thus, 
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Fig.  4.  DNA  methylation  in  the  regulation  of  the  IRF1R/IRAIN  locus.  (A)  Schematic  diagram  of  CpG  islands  in  the  TRAIN  promoter.  Vertical 
lines:  location  of  CpG  islands.  (B)  DNA  methylation  of  the  I RAIN  promoter  in  breast  cancer  cell  lines  (MCF7  and  MDA  MB  231).  Genomic 
DNAs  were  extracted  from  breast  cancer  cells.  After  treated  with  sodium  sulphite,  the  IRAIN  promoter  DNA  was  amplified  and  sequenced.  Open 
circles:  unmethylated  CpGs;  solid  circles:  methylated  CpGs.  (C)  CpG  methylation  of  the  IRAIN  promoter  in  three  breast  cancer  patients. 


compared  to  peripheral  blood  leucocytes,  breast  cancer 
specimens  had  aberrant  DNA  methylation  in  the  IRAIN 
promoter. 

3.6.  Aberrant  imprinting  of  IRAIN  IncRNA  in  breast 
cancers 

Loss  of  IGF2  imprinting  is  a  very  common  epigenetic 
abnormality  in  cancer  and  may  even  be  a  prognostic 
biomarker  [35].  In  order  to  compare  the  status  of  IRAIN 
imprinting  in  peripheral  blood  leucocytes  and  breast 
cancer  tissues,  we  studied  five  patients  for  whom  both 
blood  and  surgery  samples  were  available  to  track  allelic 
expression. 

As  seen  in  Fig.  5A,  all  peripheral  blood  leucocyte 
cDNAs  showed  monoallelic  expression  of  IRAIN.  Four 
cases  (#32,  #36,  #42,  #37)  expressed  IRAIN  IncRNA 
monoallelically  from  the  ‘A’  allele,  while  case  #39 
expressed  the  ‘G’  allele.  Surprisingly,  in  contrast  to 
peripheral  blood  leucocytes,  we  found  that  in  two  cases 
(#36,  #42),  IRAIN  expression  switched  to  the  ‘G’  allele 
in  breast  cancer  specimens.  In  case  #37,  the  breast  can¬ 
cer  expressed  the  ‘A’  allele,  while  metastatic  tumours 
switched  to  ‘G’  allele  expression.  These  data  suggest  that 
in  breast  cancer  tissues,  IRAIN  can  undergo  allelic 
switch,  expressing  the  opposite  allele  as  compared  with 
that  in  circulating  cells. 


The  IRAIN  promoter  was  aberrantly  unmethylated  in 
breast  cancer  samples  (Fig.  5B).  Intriguingly,  this  aber¬ 
rant  demethylation  pattern  was  also  observed  in  periph¬ 
eral  blood  leucocytes.  Thus,  it  seems  that  breast  cancer 
patients  may  undergo  extensive  alterations  in  promoter 
epigenotype.  In  the  human  IGF2  gene,  DNA  demethyl¬ 
ation  is  also  a  common  epigenetic  mutation  observed  in 
many  human  tumours  [36  40]. 

4.  Discussion 

As  the  IGF1R  signalling  pathway  is  often  aberrantly 
activated  in  tumours,  including  breast  cancers,  treat¬ 
ments  using  small  molecule  inhibitors  and  antibodies  to 
block  the  tyrosine  kinase  activity  have  been  advanced 
in  preclinical  and  clinical  testing  [24],  In  this  communica¬ 
tion,  we  have  characterised  IRAIN,  a  novel  5.4  kb  intra¬ 
genic  non-coding  RNA  within  the  IGF1R  locus  in 
clinical  samples  collected  from  breast  cancer  patients. 
In  cancer  tissues,  IRAIN  is  expressed  in  an  antisense  ori¬ 
entation  within  the  IGF1R  locus.  A  unique  characteristic 
of  this  non-coding  RNA  is  its  monoallelic  expression  in 
breast  cancers  (Fig.  3A  and  B).  By  tracking  the  allelic 
expression  in  patient  families,  we  demonstrate  that 
IRAIN  is  transcribed  from  the  paternal  allele,  while  the 
copy  from  the  maternal  allele  is  silenced  or  imprinted 
(Fig.  3C),  in  agreement  with  the  data  in  haematopoietic 
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Fig.  5.  Aberrant  imprinting  of  IRAIN  in  breast  cancer  patients.  (A)  Allelic  switch  of  IRAIN  long  non  coding  RNA  (IncRNA)  imprinting.  Allelic 
expression  of  IRAIN  was  examined  by  DNA  sequencing  of  the  single  nucleotide  polymorphism  (SNP)  rs8034564.  Note  the  allele  switch  of  IRAIN 
IncRNA  between  the  breast  cancer  tissues  and  peripheral  blood  leucocytes.  PBL:  peripheral  blood  leucocytes.  (B)  Aberrant  DNA  demethylation  in 
the  IRAIN  promoter  in  allelic  switching  tumours.  DNA  methylation  in  CpG  islands  in  the  IRAIN  promoter  was  quantitated  by  bisulphite 
sequencing.  Open  circles:  unmethylated  CpGs;  solid  circles:  methylated  CpGs.  Note  that  in  allelic  switching  tumours,  there  is  extensive  DNA 
demethylation  of  the  IRAIN  promoter  in  both  breast  cancer  specimens  and  peripheral  blood  leucocytes. 


malignancies  [26],  Together,  our  data  validate  IRAIN  as 
a  new  member  of  the  family  of  imprinted  genes  [41]. 

By  comparing  the  genotypes  in  informative  tissues,  it 
is  interesting  to  note  that  IRAIN  IncRNA  seems  to  pre¬ 
fer  the  ‘A’  genotype  expression  (Table  1).  In  18  hetero¬ 
zygous  breast  tissues  examined,  89%  expressed  the  ‘A’ 
allele.  The  ‘G’  allele,  however,  is  rarely  used  by  the  host 
machinery  for  transcription.  Similar  cases  have  been 
reported  in  the  TMPRSS2  gene  in  prostate  cancer  stem 
cells  [42],  In  addition,  stochastic  monoallelic  expression 
is  also  widespread  in  mammalian  genomes,  including 
olfactory  receptor,  Vlrb2  receptor,  T-cell  receptor  and 
immunoglobulin  genes,  pheromone  receptors,  pi 20 
catenin,  odorant  receptors,  and  protocadherins  [43 
47].  Allele-biased  expression  has  also  been  observed  in 
a  number  of  putative  schizophrenia  (SZ)  and  autism 
spectrum  disorder  (ASD)  SZ  and  ASD  candidate  genes, 
including  A2BP1  (RBFOX1),  ERBB4,  NLGN4X, 
NRG1,  NRG3,  NRXN1,  and  NLGN1  [48],  Random 
monoallelic  expression  in  the  brain  is  related  to  epidemi¬ 
ological  features  of  neuropsychiatric  disorders  [49,50], 
In  this  study,  however,  it  is  still  unclear  if  the  preferen¬ 


tial  ‘A’  allele  expression  is  associated  with  the  function 
of  this  non-coding  RNA. 

Several  genes  undergo  aberrant  imprinting  in  cancers 
[51  53].  The  most  extensively  studied  example  is  the 
paternally-expressed  IGF2  gene.  In  many  tumours,  both 
parental  copies  of  the  IGF2  gene  may  become  fully 
expressed  [54  56].  Reactivation  of  the  normally-sup- 
pressed  IGF2  (imprinted)  maternal  allele,  known  as  loss 
of  imprinting  (LOI),  is  a  hallmark  of  many  human 
tumours,  especially  childhood  tumours  [54  61]  and  can¬ 
cer  stem  cells  [62J.  In  this  study,  we  did  not  observe  biall- 
elic  expression  of  IRAIN  in  tumours.  However,  we  did 
show  an  IRAIN  epigenetic  abnormality  in  breast  tumour 
specimens.  In  normal  tissues,  IRAIN  IncRNA  is 
expressed  from  the  maternal  allele.  However,  in  breast 
cancer  tissues,  the  expression  of  IRAIN  IncRNA  switches 
to  the  alternative  parental  allele  (Fig.  5A).  The  mecha¬ 
nisms  underlying  this  allele-switch  in  breast  cancer  are 
not  known.  It  is  also  unclear  if  this  aberrant  allelic  switch 
will  affect  the  activity  of  the  IGF1R  signal  pathway  in 
breast  cancer.  CRISPR  Cas9  RNA  genome  editing  has 
been  recently  used  to  study  gene  function  [63,64],  and  it 
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Fig  5.  (continued) 


would  be  interesting  to  learn  if  knockdown  of  IRAIN 
using  this  approach  would  affect  IGF1R  expression  and 
thus  the  activity  of  the  IGF  signal  pathway  in  tumours. 

Allelic  expression  of  sense  and  antisense  RNAs  is 
usually  coupled  via  a  cis  transcription  competition 
mechanism.  A  typical  example  is  the  mouse  sense  Igf2r 
coding  RNA  and  the  Aim  antisense  non-coding  RNA, 
which  are  reciprocally  imprinted  [65  67]  and  tightly 
coordinated  by  DNA  methylation  in  the  Aim  promoter 
[68].  The  maternal  Aim  promoter  is  silenced  by  CpG 
island  hypermethylation.  Lack  of  the  Aim  IncRNA 
d.v-com petition  leads  to  the  expression  of  Igf2r  from 
the  maternal  allele.  In  contrast,  the  unmethylated  pater¬ 
nal  Aim  promoter  leads  to  IncRNA  expression,  silencing 
the  Igf2r  promoter  using  a  cis  regulation  mechanism 
[68,69].  In  this  study,  however,  we  found  that  allelic 
expression  between  the  IRAIN  IncRNA  and  the  IGF1R 
coding  RNA  is  totally  uncoupled.  While  the  IRAIN 
IncRNA  is  monoallelically  expressed  (Fig.  3),  the  IGF1R 
coding  rnRNA  is  known  to  be  biallelically  expressed 
[26,70,71],  However,  the  fact  that  both  IRAIN  antisense 


IncRNA  and  IGF1R  sense  RNA  are  transcribed  from 
the  paternal  chromosome  without  transcription  compe¬ 
tition  or  inhibition  may  provide  a  unique  model  to  study 
imprinting  mechanisms  [66,67]. 

In  summary,  we  have  identified  IRAIN  as  a  novel 
maternally  imprinted  IncRNA  located  within  the  human 
IGF1R  locus.  In  breast  cancers,  IRAIN  undergoes  aber¬ 
rant  allelic  switching.  However,  many  questions  remain 
to  be  explored  regarding  this  aberrant  imprinting.  For 
example,  what  is  the  impact  of  IRAIN  expression/ 
imprinting  in  the  development  of  breast  cancers?  Could 
the  aberrant  allele-switch  of  IRAIN  IncRNA  be  a  prog¬ 
nostic  biomarker?  Is  IRAIN  IncRNA  a  predictive  mar¬ 
ker  for  IGF1R  targeted  therapies?  Does  the  down- 
regulation  of  the  IRAIN  IncRNA  in  TNB  and  HER2+ 
samples  correlate  with  clinic  outcomes?  Future  studies 
are  needed  to  address  these  questions.  Detection  of 
aberrant  IGF2  imprinting  in  circulating  leucocytes  rep¬ 
resents  a  valuable  biomolecular  marker  for  predicting 
individuals  with  high  risk  for  colorectal  cancer  [38],  It 
would  be  interesting  to  learn  whether  the  aberrant 
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allele-switch  of  IRAIN  IncRNA  can  be  utilised  as  a 
prognostic  biomarker  to  assess  breast  cancer  risk. 
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